-
Notifications
You must be signed in to change notification settings - Fork 5
Add baseline configuration for performance comparison of llm-d #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -93,6 +93,51 @@ Only needed if at least one run is being deployed fresh. Both runs share the sam | |
| 2. Check for an active `oc project`: `oc project -q 2>/dev/null` | ||
| 3. Otherwise ask the user. | ||
|
|
||
| ### 0.5 Check for Baseline Configuration | ||
|
|
||
| If either run is labeled as "baseline" (doesnt use llm-d scheduling) or the user indicates one configuration is a baseline run, perform the following checks: | ||
|
|
||
| 1. **Verify baseline service exists**: | ||
| ```bash | ||
| kubectl get svc llm-d-baseline-model-server -n $NAMESPACE | ||
| ``` | ||
|
|
||
| 2. **If the service does not exist**, inform the user that you will create it, then apply the baseline service YAML: | ||
| ```bash | ||
| kubectl apply -f skills/compare-llm-d-configurations/llm-d-baseline-model-server-svc.yaml -n $NAMESPACE | ||
| ``` | ||
|
|
||
| Or if that file is not available locally, create it inline: | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The file is always available locally, it is part of the skill. No need to repeat it here,. |
||
| ```bash | ||
| cat <<EOF | kubectl apply -n $NAMESPACE -f - | ||
| apiVersion: v1 | ||
| kind: Service | ||
| metadata: | ||
| name: llm-d-baseline-model-server | ||
| spec: | ||
| selector: | ||
| llm-d.ai/inference-serving: "true" | ||
| ports: | ||
| - name: http | ||
| protocol: TCP | ||
| port: 8000 | ||
| targetPort: 8000 | ||
| type: ClusterIP | ||
| EOF | ||
| ``` | ||
|
|
||
| 3. **Verify the service has endpoints**: | ||
| ```bash | ||
| kubectl get endpoints llm-d-baseline-model-server -n $NAMESPACE | ||
| ``` | ||
|
|
||
| If no endpoints exist, inform the user that pods with the label `llm-d.ai/inference-serving=true` must be running for the baseline service to work. Check the running pod labels, list them to the user and suggest which label to use for the sellector (update the baseline service accordingly). | ||
|
|
||
| 4. **Set baseline base_url**: When running the benchmark for the baseline configuration (as Run A or as Run B), ensure the Run's config.yaml uses: | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This part should be moved to the step that runs the benchmark |
||
| ```yaml | ||
| base_url: http://llm-d-baseline-model-server.<namespace>.svc.cluster.local:8000 | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Phase 1: Run A | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| apiVersion: v1 | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. please move this yaml to a sub directory called scripts (or resources) |
||
| kind: Service | ||
| metadata: | ||
| name: llm-d-baseline-model-server | ||
| spec: | ||
| selector: | ||
| llm-d.ai/inference-serving: "true" | ||
| ports: | ||
| - name: http | ||
| protocol: TCP | ||
| port: 8000 | ||
| targetPort: 8000 | ||
| type: ClusterIP | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes that the llm-d stack is already up and running, but we may want to deploy it using the skill, then create the service