Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions skills/compare-llm-d-configurations/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,51 @@ Only needed if at least one run is being deployed fresh. Both runs share the sam
2. Check for an active `oc project`: `oc project -q 2>/dev/null`
3. Otherwise ask the user.

### 0.5 Check for Baseline Configuration

If either run is labeled as "baseline" (doesnt use llm-d scheduling) or the user indicates one configuration is a baseline run, perform the following checks:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that the llm-d stack is already up and running, but we may want to deploy it using the skill, then create the service


1. **Verify baseline service exists**:
```bash
kubectl get svc llm-d-baseline-model-server -n $NAMESPACE
```

2. **If the service does not exist**, inform the user that you will create it, then apply the baseline service YAML:
```bash
kubectl apply -f skills/compare-llm-d-configurations/llm-d-baseline-model-server-svc.yaml -n $NAMESPACE
```

Or if that file is not available locally, create it inline:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file is always available locally, it is part of the skill. No need to repeat it here,.

```bash
cat <<EOF | kubectl apply -n $NAMESPACE -f -
apiVersion: v1
kind: Service
metadata:
name: llm-d-baseline-model-server
spec:
selector:
llm-d.ai/inference-serving: "true"
ports:
- name: http
protocol: TCP
port: 8000
targetPort: 8000
type: ClusterIP
EOF
```

3. **Verify the service has endpoints**:
```bash
kubectl get endpoints llm-d-baseline-model-server -n $NAMESPACE
```

If no endpoints exist, inform the user that pods with the label `llm-d.ai/inference-serving=true` must be running for the baseline service to work. Check the running pod labels, list them to the user and suggest which label to use for the sellector (update the baseline service accordingly).

4. **Set baseline base_url**: When running the benchmark for the baseline configuration (as Run A or as Run B), ensure the Run's config.yaml uses:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part should be moved to the step that runs the benchmark

```yaml
base_url: http://llm-d-baseline-model-server.<namespace>.svc.cluster.local:8000
```

---

## Phase 1: Run A
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
apiVersion: v1
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move this yaml to a sub directory called scripts (or resources)

kind: Service
metadata:
name: llm-d-baseline-model-server
spec:
selector:
llm-d.ai/inference-serving: "true"
ports:
- name: http
protocol: TCP
port: 8000
targetPort: 8000
type: ClusterIP