fix: use correct GPU node selector labels and improve prod probe timeouts#112
fix: use correct GPU node selector labels and improve prod probe timeouts#112anfredette wants to merge 3 commits intoredhat-et:mainfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughAdds a new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/neuralnav/knowledge_base/model_catalog.py (1)
56-57: Prefer the first alias as the fallback selector label.If a fixture or external catalog entry omits
node_selector_label, this falls back to short names likeL4and reintroduces the scheduling bug this PR is fixing.self.aliases[0]is the safer default here.Suggested change
self.gpu_type = data["gpu_type"] self.aliases = data.get("aliases", [data["gpu_type"]]) # Default to primary name - self.node_selector_label = data.get("node_selector_label", data["gpu_type"]) + self.node_selector_label = data.get("node_selector_label", self.aliases[0])🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/neuralnav/knowledge_base/model_catalog.py` around lines 56 - 57, The node_selector_label fallback uses data["gpu_type"] which can be short names and reintroduce the scheduling bug; change the fallback to use the first alias instead. Update the initialization in model_catalog.ModelEntry (or wherever self.aliases and self.node_selector_label are set) so self.aliases = data.get("aliases", [data["gpu_type"]]) remains, but set self.node_selector_label = data.get("node_selector_label", self.aliases[0]) to prefer the first alias when node_selector_label is omitted. Ensure this uses the already-computed self.aliases value so the safer alias default is applied.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/neuralnav/configuration/generator.py`:
- Around line 123-126: The GPU catalog lookup currently silently defaults when
missing; change the behavior in the generator where
self._catalog.get_gpu_type(gpu_config.gpu_type) is called so that if gpu_info is
None you raise a clear exception (e.g., ValueError or RuntimeError) including
the missing gpu_type string instead of assigning defaults to gpu_hourly_rate or
gpu_node_selector_label; update the code that sets gpu_hourly_rate and
gpu_node_selector_label (the gpu_info, gpu_hourly_rate, gpu_node_selector_label
logic) to fail fast with a descriptive error message so downstream YAML is not
emitted for unknown GPUs.
In `@src/neuralnav/configuration/templates/kserve-inferenceservice.yaml.j2`:
- Around line 98-99: The template hardcodes NVIDIA keys (nvidia.com/gpu and
nvidia.com/gpu.product) which prevents AMD GPUs; add a vendor identifier to the
template context (e.g., vendor_prefix) and replace the three hardcoded
occurrences with dynamic keys: use nodeSelector key "{{ vendor_prefix
}}.com/gpu.product" instead of nvidia.com/gpu.product, replace resource
request/limit keys "nvidia.com/gpu" with "{{ vendor_prefix }}.com/gpu", and
replace the toleration key "nvidia.com/gpu" with "{{ vendor_prefix }}.com/gpu";
update the code that renders this Jinja template to pass vendor_prefix (derived
from the catalog GPU type) alongside gpu_node_selector_label so all GPU
vendor-specific keys are consistent.
---
Nitpick comments:
In `@src/neuralnav/knowledge_base/model_catalog.py`:
- Around line 56-57: The node_selector_label fallback uses data["gpu_type"]
which can be short names and reintroduce the scheduling bug; change the fallback
to use the first alias instead. Update the initialization in
model_catalog.ModelEntry (or wherever self.aliases and self.node_selector_label
are set) so self.aliases = data.get("aliases", [data["gpu_type"]]) remains, but
set self.node_selector_label = data.get("node_selector_label", self.aliases[0])
to prefer the first alias when node_selector_label is omitted. Ensure this uses
the already-computed self.aliases value so the safer alias default is applied.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2eb6d700-e5cc-4655-8681-a09881c3b7af
📒 Files selected for processing (4)
data/configuration/model_catalog.jsonsrc/neuralnav/configuration/generator.pysrc/neuralnav/configuration/templates/kserve-inferenceservice.yaml.j2src/neuralnav/knowledge_base/model_catalog.py
| nodeSelector: | ||
| nvidia.com/gpu.product: {{ gpu_type }} | ||
| nvidia.com/gpu.product: {{ gpu_node_selector_label }} |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '\n== GPU catalog entries with selector labels ==\n'
rg -n '"gpu_type":|"node_selector_label":' data/configuration/model_catalog.json
printf '\n== Hardcoded accelerator selector/resource keys ==\n'
rg -n 'nvidia\.com/gpu(\.product)?' src/neuralnav/configuration src/neuralnav/knowledge_baseRepository: redhat-et/neuralnav
Length of output: 1580
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '\n== Full kserve-inferenceservice.yaml.j2 template ==\n'
cat -n src/neuralnav/configuration/templates/kserve-inferenceservice.yaml.j2 | head -120
printf '\n== Context around lines 98-99 (node selector) ==\n'
sed -n '90,110p' src/neuralnav/configuration/templates/kserve-inferenceservice.yaml.j2 | cat -n
printf '\n== Check template variables passed to this template ==\n'
rg -B5 -A5 'kserve-inferenceservice.yaml.j2' src/neuralnav/configurationRepository: redhat-et/neuralnav
Length of output: 6173
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '\n== Template context preparation function ==\n'
sed -n '/_prepare_template_context/,/^[[:space:]]*def /p' src/neuralnav/configuration/generator.py | head -80
printf '\n== Check for any gpu_selector_key usage ==\n'
rg -n 'gpu_selector_key' src/neuralnav/configurationRepository: redhat-et/neuralnav
Length of output: 3349
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '\n== Full context dict in _prepare_template_context ==\n'
sed -n '/_prepare_template_context/,/^[[:space:]]*def /p' src/neuralnav/configuration/generator.py | tail -50
printf '\n== Check ModelCatalog GPU info structure ==\n'
rg -B10 -A10 'class.*GPU.*Info' src/neuralnav/configurationRepository: redhat-et/neuralnav
Length of output: 2075
🏁 Script executed:
#!/bin/bash
set -eulo pipefail
printf '\n== Find ModelCatalog and GPUInfo definitions ==\n'
fd -e py | xargs grep -l 'class.*ModelCatalog\|class.*GPUInfo' | head -5
printf '\n== Check model_catalog.json structure (GPU entries) ==\n'
sed -n '1275,1385p' data/configuration/model_catalog.json | head -40
printf '\n== Full context dict (end of function) ==\n'
sed -n '/_prepare_template_context/,/return context/p' src/neuralnav/configuration/generator.py | tail -30Repository: redhat-et/neuralnav
Length of output: 2768
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '\n== Complete _prepare_template_context function ==\n'
python3 << 'PYEOF'
with open('src/neuralnav/configuration/generator.py', 'r') as f:
content = f.read()
start = content.find('def _prepare_template_context(')
if start == -1:
print("Function not found")
else:
# Find the return statement
remaining = content[start:]
# Look for "return context" at the beginning of a line
import re
match = re.search(r'\n\s*return context\n', remaining)
if match:
end = start + match.end()
func = content[start:end]
print(func)
else:
print("Return statement not found")
PYEOF
printf '\n== Check if gpu_selector_key is used anywhere ==\n'
rg -n 'gpu_selector_key' src/Repository: redhat-et/neuralnav
Length of output: 5645
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '\n== Check model_catalog.py for GPU info structure ==\n'
sed -n '/class.*GPU/,/^class /p' src/neuralnav/knowledge_base/model_catalog.py | head -40
printf '\n== Check if AMD entries need different resource keys ==\n'
rg -i 'amd\.com|AMD.*gpu' src/neuralnav data/ || echo "No AMD resource keys found"Repository: redhat-et/neuralnav
Length of output: 1903
Vendor-specific selector and resource keys prevent AMD GPU deployments.
The node selector fix addresses the key name on line 99, but the template has three vendor-specific hardcodes that block AMD catalog entries like MI300X:
- Line 99:
nvidia.com/gpu.product(node selector key) - Lines 64, 68:
nvidia.com/gpu(resource requests/limits) - Line 101:
nvidia.com/gpu(toleration key)
AMD GPUs require different key names. Extract the vendor prefix from the catalog GPU type and use it consistently across the template. This requires adding a vendor identifier to the template context (not just the label value).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/neuralnav/configuration/templates/kserve-inferenceservice.yaml.j2` around
lines 98 - 99, The template hardcodes NVIDIA keys (nvidia.com/gpu and
nvidia.com/gpu.product) which prevents AMD GPUs; add a vendor identifier to the
template context (e.g., vendor_prefix) and replace the three hardcoded
occurrences with dynamic keys: use nodeSelector key "{{ vendor_prefix
}}.com/gpu.product" instead of nvidia.com/gpu.product, replace resource
request/limit keys "nvidia.com/gpu" with "{{ vendor_prefix }}.com/gpu", and
replace the toleration key "nvidia.com/gpu" with "{{ vendor_prefix }}.com/gpu";
update the code that renders this Jinja template to pass vendor_prefix (derived
from the catalog GPU type) alongside gpu_node_selector_label so all GPU
vendor-specific keys are consistent.
There was a problem hiding this comment.
Skipping this for now. The NVIDIA and AMD GPU operators use fundamentally different resource keys and conventions — it's not as simple as swapping a vendor prefix string. For example, AMD uses amd.com/gpu with different toleration and node selector patterns. Adding a vendor_prefix variable would give a false sense of AMD support without actually working end-to-end. We'll address this properly when AMD deployments are needed.
There was a problem hiding this comment.
@anfredette, understood — that's a fair point. A superficial vendor_prefix substitution would be misleading since AMD's device plugin uses different toleration patterns, node selector keys, and resource naming conventions than the NVIDIA GPU operator. Deferring until AMD deployments are concretely needed is the right call.
📝 Learnings were identified and not saved (knowledge base disabled). Enable
379ad55 to
1f21b47
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/neuralnav/configuration/generator.py`:
- Around line 125-129: Replace the bare ValueError raised when gpu_info is None
with a domain-specific typed exception so the API can map it to a 4xx error; for
example, define or reuse an exception like InvalidGenerationInputError or
InvalidGPUTypeError (subclassing a common GenerationError or BadRequestError)
and raise that instead of ValueError in the block that checks gpu_info
(referencing gpu_config.gpu_type and the gpu_info lookup in generator.py);
ensure the new exception is imported/defined in the module so callers or global
handlers can translate it to HTTP 400/422.
- Line 24: The VLLM_VERSION constant is currently set to "latest" which makes
manifests non-reproducible; change the VLLM_VERSION value in generator.py to a
specific release tag or image digest (e.g., "vX.Y.Z" or "sha256:<digest>")
instead of "latest", and update any code that composes the image string to use
this pinned value so generated YAML references the fixed tag/digest; ensure the
chosen tag/digest corresponds to a tested vLLM release before committing.
In `@src/neuralnav/knowledge_base/model_catalog.py`:
- Around line 54-58: The constructor currently silently defaults
node_selector_label from aliases which can reintroduce broken scheduling; update
the __init__ in the class that parses the GPU entry (the __init__ that sets
gpu_type, aliases, node_selector_label, memory_gb) to require
data["node_selector_label"] and raise a clear exception (e.g., ValueError) if
it's missing instead of using data.get(..., self.aliases[0]); do not fallback to
aliases and ensure the error message names the missing node_selector_label and
the gpu_type (or data) to aid debugging.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3871ce84-3717-47a7-9e0e-2584478a73d8
📒 Files selected for processing (4)
data/configuration/model_catalog.jsonsrc/neuralnav/configuration/generator.pysrc/neuralnav/configuration/templates/kserve-inferenceservice.yaml.j2src/neuralnav/knowledge_base/model_catalog.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/neuralnav/configuration/templates/kserve-inferenceservice.yaml.j2
|
|
||
| # vLLM version to use | ||
| VLLM_VERSION = "v0.6.2" | ||
| VLLM_VERSION = "latest" |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What is the current recommended way to pin vLLM container images for production deployments: a specific release tag, an immutable digest, or latest? Please check the official vLLM image/release documentation.
💡 Result:
For production, pin the vLLM image to a specific released version tag (e.g., vllm/vllm-openai:vX.Y.Z) and, if you need strict immutability/reproducibility, pin that tag’s digest (e.g., vllm/vllm-openai:vX.Y.Z@sha256:...). Avoid relying on latest.
- The official Docker docs use
:latestin run examples, but when showing how to extend the image they use a versioned tag and explicitly warn to match the vLLM version to the base image (implying version pinning for correctness/reproducibility). [1] - vLLM maintainers note that Docker Hub images/tags can be delayed/missing, and that the Docker Hub
latesttag is not always promptly updated to the newest GitHub release—creating reproducibility issues. [2]
Sources: official “Using Docker” docs [1]; vLLM forum maintainer response on Docker Hub tagging/latest lag [2].
[1] https://docs.vllm.ai/en/stable/deployment/docker/
[2] https://discuss.vllm.ai/t/vllm-openai-dockerhub-missing-0-16-tags/2399
Pin the vLLM image to a specific release version tag instead of latest.
Using latest makes generated manifests non-reproducible and creates supply-chain risk. The Docker Hub latest tag is not always promptly updated to the newest vLLM release, and pods can pull different images on restart or scale-out without code changes. For production YAML, pin to a tested release tag (e.g., vX.Y.Z) or, for strict immutability, include the image digest.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/neuralnav/configuration/generator.py` at line 24, The VLLM_VERSION
constant is currently set to "latest" which makes manifests non-reproducible;
change the VLLM_VERSION value in generator.py to a specific release tag or image
digest (e.g., "vX.Y.Z" or "sha256:<digest>") instead of "latest", and update any
code that composes the image string to use this pinned value so generated YAML
references the fixed tag/digest; ensure the chosen tag/digest corresponds to a
tested vLLM release before committing.
| if gpu_info is None: | ||
| raise ValueError( | ||
| f"Unknown GPU type '{gpu_config.gpu_type}'. " | ||
| f"Add it to the GPU catalog in data/configuration/model_catalog.json." | ||
| ) |
There was a problem hiding this comment.
Use a typed generation error that the API can map to 4xx.
Fail-fast is the right behavior, but raising a bare ValueError here now bubbles into the configuration routes’ generic Exception handlers and returns HTTP 500 for what is effectively invalid input/catalog drift. Please raise a domain-specific exception here, or ensure callers translate this path to 400/422 instead of 500.
🧰 Tools
🪛 Ruff (0.15.5)
[warning] 126-129: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/neuralnav/configuration/generator.py` around lines 125 - 129, Replace the
bare ValueError raised when gpu_info is None with a domain-specific typed
exception so the API can map it to a 4xx error; for example, define or reuse an
exception like InvalidGenerationInputError or InvalidGPUTypeError (subclassing a
common GenerationError or BadRequestError) and raise that instead of ValueError
in the block that checks gpu_info (referencing gpu_config.gpu_type and the
gpu_info lookup in generator.py); ensure the new exception is imported/defined
in the module so callers or global handlers can translate it to HTTP 400/422.
| def __init__(self, data: dict): | ||
| self.gpu_type = data["gpu_type"] | ||
| self.aliases = data.get("aliases", [data["gpu_type"]]) # Default to primary name | ||
| self.node_selector_label = data.get("node_selector_label", self.aliases[0]) | ||
| self.memory_gb = data["memory_gb"] |
There was a problem hiding this comment.
Require node_selector_label instead of silently defaulting it.
This fallback can reintroduce the exact broken scheduling path this PR is fixing: if a future GPU entry omits node_selector_label, YAML generation will quietly fall back to an alias and may produce an unschedulable nvidia.com/gpu.product selector. Please fail fast when the field is missing.
Suggested change
def __init__(self, data: dict):
self.gpu_type = data["gpu_type"]
self.aliases = data.get("aliases", [data["gpu_type"]]) # Default to primary name
- self.node_selector_label = data.get("node_selector_label", self.aliases[0])
+ try:
+ self.node_selector_label = data["node_selector_label"]
+ except KeyError as e:
+ raise ValueError(
+ f"GPU type '{self.gpu_type}' is missing required node_selector_label"
+ ) from e
self.memory_gb = data["memory_gb"]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def __init__(self, data: dict): | |
| self.gpu_type = data["gpu_type"] | |
| self.aliases = data.get("aliases", [data["gpu_type"]]) # Default to primary name | |
| self.node_selector_label = data.get("node_selector_label", self.aliases[0]) | |
| self.memory_gb = data["memory_gb"] | |
| def __init__(self, data: dict): | |
| self.gpu_type = data["gpu_type"] | |
| self.aliases = data.get("aliases", [data["gpu_type"]]) # Default to primary name | |
| try: | |
| self.node_selector_label = data["node_selector_label"] | |
| except KeyError as e: | |
| raise ValueError( | |
| f"GPU type '{self.gpu_type}' is missing required node_selector_label" | |
| ) from e | |
| self.memory_gb = data["memory_gb"] |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/neuralnav/knowledge_base/model_catalog.py` around lines 54 - 58, The
constructor currently silently defaults node_selector_label from aliases which
can reintroduce broken scheduling; update the __init__ in the class that parses
the GPU entry (the __init__ that sets gpu_type, aliases, node_selector_label,
memory_gb) to require data["node_selector_label"] and raise a clear exception
(e.g., ValueError) if it's missing instead of using data.get(...,
self.aliases[0]); do not fallback to aliases and ensure the error message names
the missing node_selector_label and the gpu_type (or data) to aid debugging.
|
Re: nitpick on Fixed — updated the fallback to use |
…outs The generated YAML used short GPU names (e.g., "L4") for the nvidia.com/gpu.product node selector, but K8s nodes use labels like "NVIDIA-L4". Add node_selector_label field to GPU catalog and use it in YAML generation. Also increase production liveness probe delay from 120s to 600s to allow time for model loading, and update vLLM image to latest. Signed-off-by: Andre Fredette <afredette@redhat.com>
1f21b47 to
80e0b5d
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/neuralnav/configuration/templates/kserve-inferenceservice.yaml.j2 (1)
81-85: Liveness probe delay increased significantly; verify recovery time is acceptable.The 600-second initial delay allows large models to load, but after startup, an unhealthy pod still takes up to 150 seconds (5 failures × 30s period) to be restarted. This seems reasonable for LLM workloads where brief transient issues shouldn't trigger restarts.
However, consider whether
startupProbewould be more appropriate here—it would allow a long startup window while keeping faster liveness detection after the container becomes ready.Alternative: Use startupProbe for model loading
startupProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 30 periodSeconds: 30 failureThreshold: 20 # 30 + 20*30 = 630s max startup time livenessProbe: httpGet: path: /health port: 8080 periodSeconds: 30 timeoutSeconds: 10 failureThreshold: 3This separates startup concerns from runtime health, allowing faster restart of genuinely unhealthy pods post-startup.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/neuralnav/configuration/templates/kserve-inferenceservice.yaml.j2` around lines 81 - 85, The liveness probe currently uses a very large initialDelaySeconds (600) which delays restarts; replace that long initialDelaySeconds on the livenessProbe with a separate startupProbe so model load time is handled by startup checks while liveness maintains faster failure detection. Specifically, add a startupProbe block (httpGet path/port matching existing probe) with initialDelaySeconds ~30, periodSeconds ~30 and a larger failureThreshold (e.g., 20) to cover long loads, then set the livenessProbe (the existing periodSeconds/timeoutSeconds/failureThreshold fields) to shorter failureThreshold (e.g., 3) and remove the 600 initialDelaySeconds from livenessProbe to restore quicker restarts after readiness; update the YAML template keys initialDelaySeconds, startupProbe and livenessProbe accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/neuralnav/configuration/templates/kserve-inferenceservice.yaml.j2`:
- Around line 81-85: The liveness probe currently uses a very large
initialDelaySeconds (600) which delays restarts; replace that long
initialDelaySeconds on the livenessProbe with a separate startupProbe so model
load time is handled by startup checks while liveness maintains faster failure
detection. Specifically, add a startupProbe block (httpGet path/port matching
existing probe) with initialDelaySeconds ~30, periodSeconds ~30 and a larger
failureThreshold (e.g., 20) to cover long loads, then set the livenessProbe (the
existing periodSeconds/timeoutSeconds/failureThreshold fields) to shorter
failureThreshold (e.g., 3) and remove the 600 initialDelaySeconds from
livenessProbe to restore quicker restarts after readiness; update the YAML
template keys initialDelaySeconds, startupProbe and livenessProbe accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f408d394-33fa-426a-a45e-5f77429ba528
📒 Files selected for processing (4)
data/configuration/model_catalog.jsonsrc/neuralnav/configuration/generator.pysrc/neuralnav/configuration/templates/kserve-inferenceservice.yaml.j2src/neuralnav/knowledge_base/model_catalog.py
🚧 Files skipped from review as they are similar to previous changes (1)
- data/configuration/model_catalog.json
Signed-off-by: Andre Fredette <afredette@redhat.com>
Signed-off-by: Andre Fredette <afredette@redhat.com>
| @@ -257,14 +259,21 @@ def _run_inference_test(deployment_id: str, prompt: str, max_tokens: int, temper | |||
| start_time = time.time() | |||
|
|
|||
| curl_cmd = [ | |||
The generated YAML used short GPU names (e.g., "L4") for the nvidia.com/gpu.product node selector, but K8s nodes use labels like "NVIDIA-L4". Add node_selector_label field to GPU catalog and use it in YAML generation. Also increase production liveness probe delay from 120s to 600s to allow time for model loading, and update vLLM image to latest.
Summary by CodeRabbit
Chores
Configuration
Bug Fixes
Style