-
Notifications
You must be signed in to change notification settings - Fork 7
fix: use correct GPU node selector labels and improve prod probe timeouts #112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -21,7 +21,7 @@ class DeploymentGenerator: | |
| """Generate deployment configurations from recommendations.""" | ||
|
|
||
| # vLLM version to use | ||
| VLLM_VERSION = "v0.6.2" | ||
| VLLM_VERSION = "latest" | ||
|
|
||
| def __init__(self, output_dir: str | None = None, simulator_mode: bool = False): | ||
| """ | ||
|
|
@@ -122,9 +122,15 @@ def _prepare_template_context( | |
|
|
||
| assert gpu_config is not None, "gpu_config is required for template context" | ||
|
|
||
| # Calculate GPU hourly rate from ModelCatalog | ||
| # Look up GPU info from ModelCatalog | ||
| gpu_info = self._catalog.get_gpu_type(gpu_config.gpu_type) | ||
| gpu_hourly_rate = gpu_info.cost_per_hour_usd if gpu_info else 1.0 | ||
| if gpu_info is None: | ||
| raise ValueError( | ||
| f"Unknown GPU type '{gpu_config.gpu_type}'. " | ||
| f"Add it to the GPU catalog in data/configuration/model_catalog.json." | ||
| ) | ||
|
Comment on lines
+127
to
+131
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use a typed generation error that the API can map to 4xx. Fail-fast is the right behavior, but raising a bare 🧰 Tools🪛 Ruff (0.15.5)[warning] 126-129: Avoid specifying long messages outside the exception class (TRY003) 🤖 Prompt for AI Agents |
||
| gpu_hourly_rate = gpu_info.cost_per_hour_usd | ||
| gpu_node_selector_label = gpu_info.node_selector_label | ||
|
|
||
| # Determine resource requests based on GPU type | ||
| gpu_type = gpu_config.gpu_type | ||
|
|
@@ -187,6 +193,7 @@ def _prepare_template_context( | |
| "simulator_mode": self.simulator_mode, | ||
| # GPU configuration | ||
| "gpu_type": gpu_config.gpu_type, | ||
| "gpu_node_selector_label": gpu_node_selector_label, | ||
| "gpu_count": gpu_config.gpu_count, | ||
| "tensor_parallel": gpu_config.tensor_parallel, | ||
| "gpus_per_replica": gpu_config.tensor_parallel, # GPUs per pod | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -78,24 +78,25 @@ spec: | |
| {% if simulator_mode %} | ||
| initialDelaySeconds: 10 | ||
| {% else %} | ||
| initialDelaySeconds: 120 | ||
| initialDelaySeconds: 600 | ||
| {% endif %} | ||
| periodSeconds: 30 | ||
| timeoutSeconds: 10 | ||
| failureThreshold: 5 | ||
| readinessProbe: | ||
| httpGet: | ||
| path: /health | ||
| port: 8080 | ||
| {% if simulator_mode %} | ||
| initialDelaySeconds: 5 | ||
| {% else %} | ||
| initialDelaySeconds: 60 | ||
| initialDelaySeconds: 120 | ||
| {% endif %} | ||
| periodSeconds: 10 | ||
| timeoutSeconds: 5 | ||
| {% if not simulator_mode %} | ||
| nodeSelector: | ||
| nvidia.com/gpu.product: {{ gpu_type }} | ||
| nvidia.com/gpu.product: {{ gpu_node_selector_label }} | ||
|
Comment on lines
98
to
+99
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
set -euo pipefail
printf '\n== GPU catalog entries with selector labels ==\n'
rg -n '"gpu_type":|"node_selector_label":' data/configuration/model_catalog.json
printf '\n== Hardcoded accelerator selector/resource keys ==\n'
rg -n 'nvidia\.com/gpu(\.product)?' src/neuralnav/configuration src/neuralnav/knowledge_baseRepository: redhat-et/neuralnav Length of output: 1580 🏁 Script executed: #!/bin/bash
set -euo pipefail
printf '\n== Full kserve-inferenceservice.yaml.j2 template ==\n'
cat -n src/neuralnav/configuration/templates/kserve-inferenceservice.yaml.j2 | head -120
printf '\n== Context around lines 98-99 (node selector) ==\n'
sed -n '90,110p' src/neuralnav/configuration/templates/kserve-inferenceservice.yaml.j2 | cat -n
printf '\n== Check template variables passed to this template ==\n'
rg -B5 -A5 'kserve-inferenceservice.yaml.j2' src/neuralnav/configurationRepository: redhat-et/neuralnav Length of output: 6173 🏁 Script executed: #!/bin/bash
set -euo pipefail
printf '\n== Template context preparation function ==\n'
sed -n '/_prepare_template_context/,/^[[:space:]]*def /p' src/neuralnav/configuration/generator.py | head -80
printf '\n== Check for any gpu_selector_key usage ==\n'
rg -n 'gpu_selector_key' src/neuralnav/configurationRepository: redhat-et/neuralnav Length of output: 3349 🏁 Script executed: #!/bin/bash
set -euo pipefail
printf '\n== Full context dict in _prepare_template_context ==\n'
sed -n '/_prepare_template_context/,/^[[:space:]]*def /p' src/neuralnav/configuration/generator.py | tail -50
printf '\n== Check ModelCatalog GPU info structure ==\n'
rg -B10 -A10 'class.*GPU.*Info' src/neuralnav/configurationRepository: redhat-et/neuralnav Length of output: 2075 🏁 Script executed: #!/bin/bash
set -eulo pipefail
printf '\n== Find ModelCatalog and GPUInfo definitions ==\n'
fd -e py | xargs grep -l 'class.*ModelCatalog\|class.*GPUInfo' | head -5
printf '\n== Check model_catalog.json structure (GPU entries) ==\n'
sed -n '1275,1385p' data/configuration/model_catalog.json | head -40
printf '\n== Full context dict (end of function) ==\n'
sed -n '/_prepare_template_context/,/return context/p' src/neuralnav/configuration/generator.py | tail -30Repository: redhat-et/neuralnav Length of output: 2768 🏁 Script executed: #!/bin/bash
set -euo pipefail
printf '\n== Complete _prepare_template_context function ==\n'
python3 << 'PYEOF'
with open('src/neuralnav/configuration/generator.py', 'r') as f:
content = f.read()
start = content.find('def _prepare_template_context(')
if start == -1:
print("Function not found")
else:
# Find the return statement
remaining = content[start:]
# Look for "return context" at the beginning of a line
import re
match = re.search(r'\n\s*return context\n', remaining)
if match:
end = start + match.end()
func = content[start:end]
print(func)
else:
print("Return statement not found")
PYEOF
printf '\n== Check if gpu_selector_key is used anywhere ==\n'
rg -n 'gpu_selector_key' src/Repository: redhat-et/neuralnav Length of output: 5645 🏁 Script executed: #!/bin/bash
set -euo pipefail
printf '\n== Check model_catalog.py for GPU info structure ==\n'
sed -n '/class.*GPU/,/^class /p' src/neuralnav/knowledge_base/model_catalog.py | head -40
printf '\n== Check if AMD entries need different resource keys ==\n'
rg -i 'amd\.com|AMD.*gpu' src/neuralnav data/ || echo "No AMD resource keys found"Repository: redhat-et/neuralnav Length of output: 1903 Vendor-specific selector and resource keys prevent AMD GPU deployments. The node selector fix addresses the key name on line 99, but the template has three vendor-specific hardcodes that block AMD catalog entries like
AMD GPUs require different key names. Extract the vendor prefix from the catalog GPU type and use it consistently across the template. This requires adding a vendor identifier to the template context (not just the label value). 🤖 Prompt for AI Agents
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Skipping this for now. The NVIDIA and AMD GPU operators use fundamentally different resource keys and conventions — it's not as simple as swapping a vendor prefix string. For example, AMD uses There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| tolerations: | ||
| - key: nvidia.com/gpu | ||
| operator: Exists | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -54,6 +54,7 @@ class GPUType: | |||||||||||||||||||||||||||||||
| def __init__(self, data: dict): | ||||||||||||||||||||||||||||||||
| self.gpu_type = data["gpu_type"] | ||||||||||||||||||||||||||||||||
| self.aliases = data.get("aliases", [data["gpu_type"]]) # Default to primary name | ||||||||||||||||||||||||||||||||
| self.node_selector_label = data.get("node_selector_label", self.aliases[0]) | ||||||||||||||||||||||||||||||||
| self.memory_gb = data["memory_gb"] | ||||||||||||||||||||||||||||||||
|
Comment on lines
54
to
58
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Require This fallback can reintroduce the exact broken scheduling path this PR is fixing: if a future GPU entry omits Suggested change def __init__(self, data: dict):
self.gpu_type = data["gpu_type"]
self.aliases = data.get("aliases", [data["gpu_type"]]) # Default to primary name
- self.node_selector_label = data.get("node_selector_label", self.aliases[0])
+ try:
+ self.node_selector_label = data["node_selector_label"]
+ except KeyError as e:
+ raise ValueError(
+ f"GPU type '{self.gpu_type}' is missing required node_selector_label"
+ ) from e
self.memory_gb = data["memory_gb"]📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||||||||||||||||
| self.compute_capability = data["compute_capability"] | ||||||||||||||||||||||||||||||||
| self.typical_use_cases = data["typical_use_cases"] | ||||||||||||||||||||||||||||||||
|
|
@@ -88,6 +89,7 @@ def to_dict(self) -> dict: | |||||||||||||||||||||||||||||||
| return { | ||||||||||||||||||||||||||||||||
| "gpu_type": self.gpu_type, | ||||||||||||||||||||||||||||||||
| "aliases": self.aliases, | ||||||||||||||||||||||||||||||||
| "node_selector_label": self.node_selector_label, | ||||||||||||||||||||||||||||||||
| "memory_gb": self.memory_gb, | ||||||||||||||||||||||||||||||||
| "compute_capability": self.compute_capability, | ||||||||||||||||||||||||||||||||
| "typical_use_cases": self.typical_use_cases, | ||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -9,7 +9,7 @@ | |
|
|
||
| import pandas as pd | ||
| import streamlit as st | ||
| from api_client import delete_deployment, get_k8s_status, load_all_deployments | ||
| from api_client import delete_deployment, load_all_deployments | ||
|
|
||
|
|
||
| def render_deployment_management_tab(): | ||
|
|
@@ -46,12 +46,14 @@ def render_deployment_management_tab(): | |
| status = dep.get("status", {}) | ||
| pods = dep.get("pods", []) | ||
| ready = status.get("ready", False) | ||
| table_data.append({ | ||
| "Status": "Ready" if ready else "Pending", | ||
| "Name": dep["deployment_id"], | ||
| "Pods": len(pods), | ||
| "Ready": "Yes" if ready else "No", | ||
| }) | ||
| table_data.append( | ||
| { | ||
| "Status": "Ready" if ready else "Pending", | ||
| "Name": dep["deployment_id"], | ||
| "Pods": len(pods), | ||
| "Ready": "Yes" if ready else "No", | ||
| } | ||
| ) | ||
|
|
||
| df = pd.DataFrame(table_data) | ||
| st.dataframe(df, use_container_width=True, hide_index=True) | ||
|
|
@@ -257,14 +259,21 @@ def _run_inference_test(deployment_id: str, prompt: str, max_tokens: int, temper | |
| start_time = time.time() | ||
|
|
||
| curl_cmd = [ | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use |
||
| "curl", "-s", "-X", "POST", | ||
| "curl", | ||
| "-s", | ||
| "-X", | ||
| "POST", | ||
| "http://localhost:8080/v1/completions", | ||
| "-H", "Content-Type: application/json", | ||
| "-d", json.dumps({ | ||
| "prompt": prompt, | ||
| "max_tokens": max_tokens, | ||
| "temperature": temperature, | ||
| }), | ||
| "-H", | ||
| "Content-Type: application/json", | ||
| "-d", | ||
| json.dumps( | ||
| { | ||
| "prompt": prompt, | ||
| "max_tokens": max_tokens, | ||
| "temperature": temperature, | ||
| } | ||
| ), | ||
| ] | ||
|
|
||
| with st.expander("Debug Info"): | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
What is the current recommended way to pin vLLM container images for production deployments: a specific release tag, an immutable digest, orlatest? Please check the official vLLM image/release documentation.💡 Result:
For production, pin the vLLM image to a specific released version tag (e.g.,
vllm/vllm-openai:vX.Y.Z) and, if you need strict immutability/reproducibility, pin that tag’s digest (e.g.,vllm/vllm-openai:vX.Y.Z@sha256:...). Avoid relying onlatest.:latestin run examples, but when showing how to extend the image they use a versioned tag and explicitly warn to match the vLLM version to the base image (implying version pinning for correctness/reproducibility). [1]latesttag is not always promptly updated to the newest GitHub release—creating reproducibility issues. [2]Sources: official “Using Docker” docs [1]; vLLM forum maintainer response on Docker Hub tagging/
latestlag [2].[1] https://docs.vllm.ai/en/stable/deployment/docker/
[2] https://discuss.vllm.ai/t/vllm-openai-dockerhub-missing-0-16-tags/2399
Pin the vLLM image to a specific release version tag instead of
latest.Using
latestmakes generated manifests non-reproducible and creates supply-chain risk. The Docker Hublatesttag is not always promptly updated to the newest vLLM release, and pods can pull different images on restart or scale-out without code changes. For production YAML, pin to a tested release tag (e.g.,vX.Y.Z) or, for strict immutability, include the image digest.🤖 Prompt for AI Agents