fix: use correct GPU node selector labels and improve prod probe timeouts by anfredette · Pull Request #112 · redhat-et/neuralnav

anfredette · 2026-03-10T15:05:06Z

The generated YAML used short GPU names (e.g., "L4") for the nvidia.com/gpu.product node selector, but K8s nodes use labels like "NVIDIA-L4". Add node_selector_label field to GPU catalog and use it in YAML generation. Also increase production liveness probe delay from 120s to 600s to allow time for model loading, and update vLLM image to latest.

Summary by CodeRabbit

Chores
- Updated vLLM to the latest release.
Configuration
- Added explicit GPU node selector labels to all GPU catalog entries for clearer resource targeting.
- Extended production health checks: liveness initial delay increased to 600s with higher failure threshold; readiness initial delay increased to 120s.
Bug Fixes
- Deployments now surface clear errors for unknown or unsupported GPU types and validate GPU pricing/configuration.
Style
- Minor UI string and layout formatting improvements for clearer messages.

coderabbitai · 2026-03-10T15:05:30Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 86d721ef-d3ae-45df-b91b-3fca362f6c84

📥 Commits

Reviewing files that changed from the base of the PR and between 80e0b5d and a27b1d2.

📒 Files selected for processing (4)

ui/app.py
ui/components/deployment.py
ui/components/deployment_management.py
ui/components/settings.py

📝 Walkthrough

Walkthrough

Adds a new node_selector_label to GPU entries in the catalog and model layer, surfaces it in the deployment generator context, enforces strict GPU pricing lookup (raises on unknown GPU), updates VLLM version to "latest", and adjusts production liveness/readiness probe timings and nodeSelector key in the KServe template.

Changes

Cohort / File(s)	Summary
GPU Catalog Data `data/configuration/model_catalog.json`	Added `node_selector_label` to each `gpu_types` entry; values mirror GPU alias names for L4, A10G, A100-40, A100-80, H100, H200, B200, MI300X.
Model / Data Model `src/neuralnav/knowledge_base/model_catalog.py`	Added `node_selector_label` attribute on `GPUType`, defaulting to the primary alias when absent; included in `to_dict()` output.
Deployment Generator `src/neuralnav/configuration/generator.py`	Bumped `VLLM_VERSION` v0.6.2→`latest`; GPU lookup now errors on unknown types and derives `gpu_hourly_rate` and `gpu_node_selector_label` from ModelCatalog; `gpu_node_selector_label` exposed in template context.
KServe Inference Template `src/neuralnav/configuration/templates/kserve-inferenceservice.yaml.j2`	Production probe timing changes: liveness `initialDelaySeconds` 120→600 and added `failureThreshold: 5`; readiness `initialDelaySeconds` 60→120. In production path, `nodeSelector` key changed from `gpu_type` to `gpu_node_selector_label`.
UI — layout & strings `ui/app.py`, `ui/components/deployment.py`, `ui/components/deployment_management.py`, `ui/components/settings.py`	Cosmetic/formatting changes: multiline splitting of tab labels, reformatted multi-line strings for messages and commands, removed one imported symbol from public imports; no functional behavior changes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the two main changes: adding correct GPU node selector labels and improving production probe timeouts.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

src/neuralnav/knowledge_base/model_catalog.py (1)

56-57: Prefer the first alias as the fallback selector label.

If a fixture or external catalog entry omits node_selector_label, this falls back to short names like L4 and reintroduces the scheduling bug this PR is fixing. self.aliases[0] is the safer default here.

Suggested change

         self.gpu_type = data["gpu_type"]
         self.aliases = data.get("aliases", [data["gpu_type"]])  # Default to primary name
-        self.node_selector_label = data.get("node_selector_label", data["gpu_type"])
+        self.node_selector_label = data.get("node_selector_label", self.aliases[0])

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/neuralnav/knowledge_base/model_catalog.py` around lines 56 - 57, The
node_selector_label fallback uses data["gpu_type"] which can be short names and
reintroduce the scheduling bug; change the fallback to use the first alias
instead. Update the initialization in model_catalog.ModelEntry (or wherever
self.aliases and self.node_selector_label are set) so self.aliases =
data.get("aliases", [data["gpu_type"]]) remains, but set
self.node_selector_label = data.get("node_selector_label", self.aliases[0]) to
prefer the first alias when node_selector_label is omitted. Ensure this uses the
already-computed self.aliases value so the safer alias default is applied.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/neuralnav/configuration/generator.py`:
- Around line 123-126: The GPU catalog lookup currently silently defaults when
missing; change the behavior in the generator where
self._catalog.get_gpu_type(gpu_config.gpu_type) is called so that if gpu_info is
None you raise a clear exception (e.g., ValueError or RuntimeError) including
the missing gpu_type string instead of assigning defaults to gpu_hourly_rate or
gpu_node_selector_label; update the code that sets gpu_hourly_rate and
gpu_node_selector_label (the gpu_info, gpu_hourly_rate, gpu_node_selector_label
logic) to fail fast with a descriptive error message so downstream YAML is not
emitted for unknown GPUs.

In `@src/neuralnav/configuration/templates/kserve-inferenceservice.yaml.j2`:
- Around line 98-99: The template hardcodes NVIDIA keys (nvidia.com/gpu and
nvidia.com/gpu.product) which prevents AMD GPUs; add a vendor identifier to the
template context (e.g., vendor_prefix) and replace the three hardcoded
occurrences with dynamic keys: use nodeSelector key "{{ vendor_prefix
}}.com/gpu.product" instead of nvidia.com/gpu.product, replace resource
request/limit keys "nvidia.com/gpu" with "{{ vendor_prefix }}.com/gpu", and
replace the toleration key "nvidia.com/gpu" with "{{ vendor_prefix }}.com/gpu";
update the code that renders this Jinja template to pass vendor_prefix (derived
from the catalog GPU type) alongside gpu_node_selector_label so all GPU
vendor-specific keys are consistent.

---

Nitpick comments:
In `@src/neuralnav/knowledge_base/model_catalog.py`:
- Around line 56-57: The node_selector_label fallback uses data["gpu_type"]
which can be short names and reintroduce the scheduling bug; change the fallback
to use the first alias instead. Update the initialization in
model_catalog.ModelEntry (or wherever self.aliases and self.node_selector_label
are set) so self.aliases = data.get("aliases", [data["gpu_type"]]) remains, but
set self.node_selector_label = data.get("node_selector_label", self.aliases[0])
to prefer the first alias when node_selector_label is omitted. Ensure this uses
the already-computed self.aliases value so the safer alias default is applied.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2eb6d700-e5cc-4655-8681-a09881c3b7af

📥 Commits

Reviewing files that changed from the base of the PR and between 2eb667f and 379ad55.

📒 Files selected for processing (4)

data/configuration/model_catalog.json
src/neuralnav/configuration/generator.py
src/neuralnav/configuration/templates/kserve-inferenceservice.yaml.j2
src/neuralnav/knowledge_base/model_catalog.py

src/neuralnav/configuration/generator.py

coderabbitai · 2026-03-10T15:12:20Z

src/neuralnav/configuration/templates/kserve-inferenceservice.yaml.j2

    nodeSelector:
-      nvidia.com/gpu.product: {{ gpu_type }}
+      nvidia.com/gpu.product: {{ gpu_node_selector_label }}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail printf '\n== GPU catalog entries with selector labels ==\n' rg -n '"gpu_type":|"node_selector_label":' data/configuration/model_catalog.json printf '\n== Hardcoded accelerator selector/resource keys ==\n' rg -n 'nvidia\.com/gpu(\.product)?' src/neuralnav/configuration src/neuralnav/knowledge_base

Repository: redhat-et/neuralnav

Length of output: 1580

🏁 Script executed:

#!/bin/bash set -euo pipefail printf '\n== Full kserve-inferenceservice.yaml.j2 template ==\n' cat -n src/neuralnav/configuration/templates/kserve-inferenceservice.yaml.j2 | head -120 printf '\n== Context around lines 98-99 (node selector) ==\n' sed -n '90,110p' src/neuralnav/configuration/templates/kserve-inferenceservice.yaml.j2 | cat -n printf '\n== Check template variables passed to this template ==\n' rg -B5 -A5 'kserve-inferenceservice.yaml.j2' src/neuralnav/configuration

Repository: redhat-et/neuralnav

Length of output: 6173

🏁 Script executed:

#!/bin/bash set -euo pipefail printf '\n== Template context preparation function ==\n' sed -n '/_prepare_template_context/,/^[[:space:]]*def /p' src/neuralnav/configuration/generator.py | head -80 printf '\n== Check for any gpu_selector_key usage ==\n' rg -n 'gpu_selector_key' src/neuralnav/configuration

Repository: redhat-et/neuralnav

Length of output: 3349

🏁 Script executed:

#!/bin/bash set -euo pipefail printf '\n== Full context dict in _prepare_template_context ==\n' sed -n '/_prepare_template_context/,/^[[:space:]]*def /p' src/neuralnav/configuration/generator.py | tail -50 printf '\n== Check ModelCatalog GPU info structure ==\n' rg -B10 -A10 'class.*GPU.*Info' src/neuralnav/configuration

Repository: redhat-et/neuralnav

Length of output: 2075

🏁 Script executed:

#!/bin/bash set -eulo pipefail printf '\n== Find ModelCatalog and GPUInfo definitions ==\n' fd -e py | xargs grep -l 'class.*ModelCatalog\|class.*GPUInfo' | head -5 printf '\n== Check model_catalog.json structure (GPU entries) ==\n' sed -n '1275,1385p' data/configuration/model_catalog.json | head -40 printf '\n== Full context dict (end of function) ==\n' sed -n '/_prepare_template_context/,/return context/p' src/neuralnav/configuration/generator.py | tail -30

Repository: redhat-et/neuralnav

Length of output: 2768

🏁 Script executed:

#!/bin/bash set -euo pipefail printf '\n== Complete _prepare_template_context function ==\n' python3 << 'PYEOF' with open('src/neuralnav/configuration/generator.py', 'r') as f: content = f.read() start = content.find('def _prepare_template_context(') if start == -1: print("Function not found") else: # Find the return statement remaining = content[start:] # Look for "return context" at the beginning of a line import re match = re.search(r'\n\s*return context\n', remaining) if match: end = start + match.end() func = content[start:end] print(func) else: print("Return statement not found") PYEOF printf '\n== Check if gpu_selector_key is used anywhere ==\n' rg -n 'gpu_selector_key' src/

Repository: redhat-et/neuralnav

Length of output: 5645

🏁 Script executed:

#!/bin/bash set -euo pipefail printf '\n== Check model_catalog.py for GPU info structure ==\n' sed -n '/class.*GPU/,/^class /p' src/neuralnav/knowledge_base/model_catalog.py | head -40 printf '\n== Check if AMD entries need different resource keys ==\n' rg -i 'amd\.com|AMD.*gpu' src/neuralnav data/ || echo "No AMD resource keys found"

Repository: redhat-et/neuralnav

Length of output: 1903

Vendor-specific selector and resource keys prevent AMD GPU deployments.

The node selector fix addresses the key name on line 99, but the template has three vendor-specific hardcodes that block AMD catalog entries like MI300X:

Line 99: nvidia.com/gpu.product (node selector key)

Lines 64, 68: nvidia.com/gpu (resource requests/limits)

Line 101: nvidia.com/gpu (toleration key)

AMD GPUs require different key names. Extract the vendor prefix from the catalog GPU type and use it consistently across the template. This requires adding a vendor identifier to the template context (not just the label value).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/neuralnav/configuration/templates/kserve-inferenceservice.yaml.j2` around lines 98 - 99, The template hardcodes NVIDIA keys (nvidia.com/gpu and nvidia.com/gpu.product) which prevents AMD GPUs; add a vendor identifier to the template context (e.g., vendor_prefix) and replace the three hardcoded occurrences with dynamic keys: use nodeSelector key "{{ vendor_prefix }}.com/gpu.product" instead of nvidia.com/gpu.product, replace resource request/limit keys "nvidia.com/gpu" with "{{ vendor_prefix }}.com/gpu", and replace the toleration key "nvidia.com/gpu" with "{{ vendor_prefix }}.com/gpu"; update the code that renders this Jinja template to pass vendor_prefix (derived from the catalog GPU type) alongside gpu_node_selector_label so all GPU vendor-specific keys are consistent.

Skipping this for now. The NVIDIA and AMD GPU operators use fundamentally different resource keys and conventions — it's not as simple as swapping a vendor prefix string. For example, AMD uses amd.com/gpu with different toleration and node selector patterns. Adding a vendor_prefix variable would give a false sense of AMD support without actually working end-to-end. We'll address this properly when AMD deployments are needed.

@anfredette, understood — that's a fair point. A superficial vendor_prefix substitution would be misleading since AMD's device plugin uses different toleration patterns, node selector keys, and resource naming conventions than the NVIDIA GPU operator. Deferring until AMD deployments are concretely needed is the right call.

📝 Learnings were identified and not saved (knowledge base disabled). Enable

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/neuralnav/configuration/generator.py`:
- Around line 125-129: Replace the bare ValueError raised when gpu_info is None
with a domain-specific typed exception so the API can map it to a 4xx error; for
example, define or reuse an exception like InvalidGenerationInputError or
InvalidGPUTypeError (subclassing a common GenerationError or BadRequestError)
and raise that instead of ValueError in the block that checks gpu_info
(referencing gpu_config.gpu_type and the gpu_info lookup in generator.py);
ensure the new exception is imported/defined in the module so callers or global
handlers can translate it to HTTP 400/422.
- Line 24: The VLLM_VERSION constant is currently set to "latest" which makes
manifests non-reproducible; change the VLLM_VERSION value in generator.py to a
specific release tag or image digest (e.g., "vX.Y.Z" or "sha256:<digest>")
instead of "latest", and update any code that composes the image string to use
this pinned value so generated YAML references the fixed tag/digest; ensure the
chosen tag/digest corresponds to a tested vLLM release before committing.

In `@src/neuralnav/knowledge_base/model_catalog.py`:
- Around line 54-58: The constructor currently silently defaults
node_selector_label from aliases which can reintroduce broken scheduling; update
the __init__ in the class that parses the GPU entry (the __init__ that sets
gpu_type, aliases, node_selector_label, memory_gb) to require
data["node_selector_label"] and raise a clear exception (e.g., ValueError) if
it's missing instead of using data.get(..., self.aliases[0]); do not fallback to
aliases and ensure the error message names the missing node_selector_label and
the gpu_type (or data) to aid debugging.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3871ce84-3717-47a7-9e0e-2584478a73d8

📥 Commits

Reviewing files that changed from the base of the PR and between 379ad55 and 1f21b47.

📒 Files selected for processing (4)

data/configuration/model_catalog.json
src/neuralnav/configuration/generator.py
src/neuralnav/configuration/templates/kserve-inferenceservice.yaml.j2
src/neuralnav/knowledge_base/model_catalog.py

🚧 Files skipped from review as they are similar to previous changes (1)

src/neuralnav/configuration/templates/kserve-inferenceservice.yaml.j2

coderabbitai · 2026-03-10T18:37:07Z

src/neuralnav/configuration/generator.py


    # vLLM version to use
-    VLLM_VERSION = "v0.6.2"
+    VLLM_VERSION = "latest"


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

What is the current recommended way to pin vLLM container images for production deployments: a specific release tag, an immutable digest, or latest? Please check the official vLLM image/release documentation.

💡 Result:

For production, pin the vLLM image to a specific released version tag (e.g., vllm/vllm-openai:vX.Y.Z) and, if you need strict immutability/reproducibility, pin that tag’s digest (e.g., vllm/vllm-openai:vX.Y.Z@sha256:...). Avoid relying on latest.

The official Docker docs use :latest in run examples, but when showing how to extend the image they use a versioned tag and explicitly warn to match the vLLM version to the base image (implying version pinning for correctness/reproducibility). [1]

vLLM maintainers note that Docker Hub images/tags can be delayed/missing, and that the Docker Hub latest tag is not always promptly updated to the newest GitHub release—creating reproducibility issues. [2]

Sources: official “Using Docker” docs [1]; vLLM forum maintainer response on Docker Hub tagging/latest lag [2].

[1] https://docs.vllm.ai/en/stable/deployment/docker/
[2] https://discuss.vllm.ai/t/vllm-openai-dockerhub-missing-0-16-tags/2399

Pin the vLLM image to a specific release version tag instead of latest.

Using latest makes generated manifests non-reproducible and creates supply-chain risk. The Docker Hub latest tag is not always promptly updated to the newest vLLM release, and pods can pull different images on restart or scale-out without code changes. For production YAML, pin to a tested release tag (e.g., vX.Y.Z) or, for strict immutability, include the image digest.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/neuralnav/configuration/generator.py` at line 24, The VLLM_VERSION constant is currently set to "latest" which makes manifests non-reproducible; change the VLLM_VERSION value in generator.py to a specific release tag or image digest (e.g., "vX.Y.Z" or "sha256:<digest>") instead of "latest", and update any code that composes the image string to use this pinned value so generated YAML references the fixed tag/digest; ensure the chosen tag/digest corresponds to a tested vLLM release before committing.

coderabbitai · 2026-03-10T18:37:07Z

src/neuralnav/configuration/generator.py

+        if gpu_info is None:
+            raise ValueError(
+                f"Unknown GPU type '{gpu_config.gpu_type}'. "
+                f"Add it to the GPU catalog in data/configuration/model_catalog.json."
+            )


⚠️ Potential issue | 🟠 Major

Use a typed generation error that the API can map to 4xx.

Fail-fast is the right behavior, but raising a bare ValueError here now bubbles into the configuration routes’ generic Exception handlers and returns HTTP 500 for what is effectively invalid input/catalog drift. Please raise a domain-specific exception here, or ensure callers translate this path to 400/422 instead of 500.

🧰 Tools

🪛 Ruff (0.15.5)

[warning] 126-129: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/neuralnav/configuration/generator.py` around lines 125 - 129, Replace the bare ValueError raised when gpu_info is None with a domain-specific typed exception so the API can map it to a 4xx error; for example, define or reuse an exception like InvalidGenerationInputError or InvalidGPUTypeError (subclassing a common GenerationError or BadRequestError) and raise that instead of ValueError in the block that checks gpu_info (referencing gpu_config.gpu_type and the gpu_info lookup in generator.py); ensure the new exception is imported/defined in the module so callers or global handlers can translate it to HTTP 400/422.

coderabbitai · 2026-03-10T18:37:07Z

src/neuralnav/knowledge_base/model_catalog.py

    def __init__(self, data: dict):
        self.gpu_type = data["gpu_type"]
        self.aliases = data.get("aliases", [data["gpu_type"]])  # Default to primary name
+        self.node_selector_label = data.get("node_selector_label", self.aliases[0])
        self.memory_gb = data["memory_gb"]


⚠️ Potential issue | 🟠 Major

Require node_selector_label instead of silently defaulting it.

This fallback can reintroduce the exact broken scheduling path this PR is fixing: if a future GPU entry omits node_selector_label, YAML generation will quietly fall back to an alias and may produce an unschedulable nvidia.com/gpu.product selector. Please fail fast when the field is missing.

Suggested change

def __init__(self, data: dict): self.gpu_type = data["gpu_type"] self.aliases = data.get("aliases", [data["gpu_type"]]) # Default to primary name - self.node_selector_label = data.get("node_selector_label", self.aliases[0]) + try: + self.node_selector_label = data["node_selector_label"] + except KeyError as e: + raise ValueError( + f"GPU type '{self.gpu_type}' is missing required node_selector_label" + ) from e self.memory_gb = data["memory_gb"]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def __init__(self, data: dict):

self.gpu_type = data["gpu_type"]

self.aliases = data.get("aliases", [data["gpu_type"]]) # Default to primary name

self.node_selector_label = data.get("node_selector_label", self.aliases[0])

self.memory_gb = data["memory_gb"]

def __init__(self, data: dict):

self.gpu_type = data["gpu_type"]

self.aliases = data.get("aliases", [data["gpu_type"]]) # Default to primary name

try:

self.node_selector_label = data["node_selector_label"]

except KeyError as e:

raise ValueError(

f"GPU type '{self.gpu_type}' is missing required node_selector_label"

) from e

self.memory_gb = data["memory_gb"]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/neuralnav/knowledge_base/model_catalog.py` around lines 54 - 58, The constructor currently silently defaults node_selector_label from aliases which can reintroduce broken scheduling; update the __init__ in the class that parses the GPU entry (the __init__ that sets gpu_type, aliases, node_selector_label, memory_gb) to require data["node_selector_label"] and raise a clear exception (e.g., ValueError) if it's missing instead of using data.get(..., self.aliases[0]); do not fallback to aliases and ensure the error message names the missing node_selector_label and the gpu_type (or data) to aid debugging.

anfredette · 2026-03-10T18:37:38Z

Re: nitpick on model_catalog.py (node_selector_label fallback):

Fixed — updated the fallback to use self.aliases[0] instead of data["gpu_type"], so it picks up the full name (e.g., "NVIDIA-L4") rather than the short name ("L4") if node_selector_label is omitted from the catalog.

…outs The generated YAML used short GPU names (e.g., "L4") for the nvidia.com/gpu.product node selector, but K8s nodes use labels like "NVIDIA-L4". Add node_selector_label field to GPU catalog and use it in YAML generation. Also increase production liveness probe delay from 120s to 600s to allow time for model loading, and update vLLM image to latest. Signed-off-by: Andre Fredette <afredette@redhat.com>

coderabbitai

🧹 Nitpick comments (1)

src/neuralnav/configuration/templates/kserve-inferenceservice.yaml.j2 (1)
81-85: Liveness probe delay increased significantly; verify recovery time is acceptable.

The 600-second initial delay allows large models to load, but after startup, an unhealthy pod still takes up to 150 seconds (5 failures × 30s period) to be restarted. This seems reasonable for LLM workloads where brief transient issues shouldn't trigger restarts.

However, consider whether startupProbe would be more appropriate here—it would allow a long startup window while keeping faster liveness detection after the container becomes ready.
Alternative: Use startupProbe for model loading
      startupProbe:
        httpGet:
          path: /health
          port: 8080
        initialDelaySeconds: 30
        periodSeconds: 30
        failureThreshold: 20  # 30 + 20*30 = 630s max startup time
      livenessProbe:
        httpGet:
          path: /health
          port: 8080
        periodSeconds: 30
        timeoutSeconds: 10
        failureThreshold: 3
This separates startup concerns from runtime health, allowing faster restart of genuinely unhealthy pods post-startup.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/neuralnav/configuration/templates/kserve-inferenceservice.yaml.j2` around
lines 81 - 85, The liveness probe currently uses a very large
initialDelaySeconds (600) which delays restarts; replace that long
initialDelaySeconds on the livenessProbe with a separate startupProbe so model
load time is handled by startup checks while liveness maintains faster failure
detection. Specifically, add a startupProbe block (httpGet path/port matching
existing probe) with initialDelaySeconds ~30, periodSeconds ~30 and a larger
failureThreshold (e.g., 20) to cover long loads, then set the livenessProbe (the
existing periodSeconds/timeoutSeconds/failureThreshold fields) to shorter
failureThreshold (e.g., 3) and remove the 600 initialDelaySeconds from
livenessProbe to restore quicker restarts after readiness; update the YAML
template keys initialDelaySeconds, startupProbe and livenessProbe accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/neuralnav/configuration/templates/kserve-inferenceservice.yaml.j2`:
- Around line 81-85: The liveness probe currently uses a very large
initialDelaySeconds (600) which delays restarts; replace that long
initialDelaySeconds on the livenessProbe with a separate startupProbe so model
load time is handled by startup checks while liveness maintains faster failure
detection. Specifically, add a startupProbe block (httpGet path/port matching
existing probe) with initialDelaySeconds ~30, periodSeconds ~30 and a larger
failureThreshold (e.g., 20) to cover long loads, then set the livenessProbe (the
existing periodSeconds/timeoutSeconds/failureThreshold fields) to shorter
failureThreshold (e.g., 3) and remove the 600 initialDelaySeconds from
livenessProbe to restore quicker restarts after readiness; update the YAML
template keys initialDelaySeconds, startupProbe and livenessProbe accordingly.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f408d394-33fa-426a-a45e-5f77429ba528

📥 Commits

Reviewing files that changed from the base of the PR and between 1f21b47 and 80e0b5d.

📒 Files selected for processing (4)

data/configuration/model_catalog.json
src/neuralnav/configuration/generator.py
src/neuralnav/configuration/templates/kserve-inferenceservice.yaml.j2
src/neuralnav/knowledge_base/model_catalog.py

🚧 Files skipped from review as they are similar to previous changes (1)

data/configuration/model_catalog.json

Signed-off-by: Andre Fredette <afredette@redhat.com>

amito · 2026-03-17T22:19:43Z

ui/components/deployment_management.py

@@ -257,14 +259,21 @@ def _run_inference_test(deployment_id: str, prompt: str, max_tokens: int, temper
            start_time = time.time()

            curl_cmd = [


Use requests?

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

anfredette force-pushed the yaml-gen-test branch from 379ad55 to 1f21b47 Compare March 10, 2026 18:31

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

anfredette force-pushed the yaml-gen-test branch from 1f21b47 to 80e0b5d Compare March 10, 2026 22:21

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

anfredette added 2 commits March 10, 2026 18:33

chore: run make format

520d44c

Signed-off-by: Andre Fredette <afredette@redhat.com>

fix: fix lint errors

a27b1d2

Signed-off-by: Andre Fredette <afredette@redhat.com>

amito reviewed Mar 17, 2026

View reviewed changes

		@@ -257,14 +259,21 @@ def _run_inference_test(deployment_id: str, prompt: str, max_tokens: int, temper
		start_time = time.time()

		curl_cmd = [

Conversation

anfredette commented Mar 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anfredette Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

anfredette commented Mar 10, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

amito Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

anfredette commented Mar 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 10, 2026 •

edited

Loading

coderabbitai bot Mar 10, 2026 •

edited

Loading