fix(health-url): fix health url #578

AWarno · 2025-12-18T13:29:40Z

Fix the Slurm health URL.
Add NIM examples that require these health endpoints.

Summary by CodeRabbit

New Features
- Added example configurations for deploying NIM evaluators locally and on SLURM clusters, including pre-configured evaluation tasks (ifeval, gsm8k) and environment setup.
Refactor
- Improved health check endpoint configuration for consistent resolution across different deployment scenarios.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: Anna Warno <[email protected]>

marta-sd · 2025-12-18T13:54:56Z

packages/nemo-evaluator-launcher/examples/local_nim.yaml

+      NGC_API_KEY: ${oc.env:NGC_API_KEY}  # Required for NIM container authentication
+  mounts:
+    deployment:
+      /path/to/nim/cache: /opt/nim/.cache  # Mount NIM cache directory


Please add info to instruction above that this path needs to be set

marta-sd · 2025-12-18T13:55:41Z

packages/nemo-evaluator-launcher/examples/slurm_nim.yaml

+  - _self_
+
+# SLURM execution configuration
+execution:


don't we use NIM cache for slurm?

AWarno · 2025-12-18T15:42:26Z

@CodeRabbit review

coderabbitai · 2025-12-18T15:42:32Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2025-12-18T15:46:07Z

Walkthrough

Two new example configurations for Nim-based evaluator deployment (local and SLURM) are added, alongside updates to the core Nim deployment configuration and SLURM executor to standardize health check endpoint paths from /health to /v1/health/ready.

Changes

Cohort / File(s)	Summary
Example Configurations `packages/nemo-evaluator-launcher/examples/local_nim.yaml`, `packages/nemo-evaluator-launcher/examples/slurm_nim.yaml`	New YAML example files demonstrating local and SLURM-based Nim evaluator deployment with configurations for execution, deployment, and evaluation tasks (ifeval, gsm8k).
Deployment Configuration `packages/nemo-evaluator-launcher/src/nemo_evaluator_launcher/configs/deployment/nim.yaml`	Added `command` field (`/opt/nim/start_server.sh`) and updated health endpoint from `/health` to `/v1/health/ready`.
Executor Logic `packages/nemo-evaluator-launcher/src/nemo_evaluator_launcher/executors/slurm/executor.py`	Updated health check path resolution across multiple methods to consistently use `cfg.deployment.endpoints.health` instead of `cfg.deployment.health_check_path`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Review of two new example YAML files for correctness and consistency with existing patterns
Verify health endpoint path changes align across deployment configuration and executor logic
Confirm proxy configuration health check path defaults properly fall back to endpoints.health

Poem

A pair of new configs hops in today,
Local and SLURM show the Nim-powered way,
Health checks now point to /v1/health/ready,
The executor's aligned, consistent and steady,
Evaluations leap forward, they're healthy and keen! 🐰✨

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'fix(health-url): fix health url' is vague and repetitive, using non-descriptive language that doesn't clearly convey the specific changes made.	Improve the title to be more specific and descriptive, such as 'fix(health-url): update health check endpoints in deployment and executor' to better reflect the actual changes.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch awarno/fix-health-url

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 359b97d and 1b68ecf.

📒 Files selected for processing (4)

packages/nemo-evaluator-launcher/examples/local_nim.yaml (1 hunks)
packages/nemo-evaluator-launcher/examples/slurm_nim.yaml (1 hunks)
packages/nemo-evaluator-launcher/src/nemo_evaluator_launcher/configs/deployment/nim.yaml (1 hunks)
packages/nemo-evaluator-launcher/src/nemo_evaluator_launcher/executors/slurm/executor.py (3 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-12-10T21:34:56.965Z

Learnt from: marta-sd
Repo: NVIDIA-NeMo/Evaluator PR: 523
File: packages/nemo-evaluator-launcher/examples/local_reasoning.yaml:55-55
Timestamp: 2025-12-10T21:34:56.965Z
Learning: When editing YAML configuration in the Nemo Evaluator launcher examples, align mapping.toml section names with the actual container/image naming convention used by simple-evals (hyphenated, e.g., 'simple-evals'). This ensures consistency between the container naming and config sections. Apply this guideline to all YAML files under packages/nemo-evaluator-launcher/examples that define mapping.toml sections, and adjust section keys to match the container/framework naming convention used by the published image.

Applied to files:

packages/nemo-evaluator-launcher/examples/slurm_nim.yaml
packages/nemo-evaluator-launcher/examples/local_nim.yaml

🔇 Additional comments (7)

packages/nemo-evaluator-launcher/examples/local_nim.yaml (1)

16-27: LGTM! Clear usage instructions and warnings.

The usage instructions are well-structured and include appropriate warnings about test-only configurations. The step-by-step guide will help users configure the example correctly.

</review_comment_end>

packages/nemo-evaluator-launcher/examples/slurm_nim.yaml (1)

16-30: LGTM! Comprehensive usage instructions.

The usage instructions effectively guide users through the configuration process, including CLI argument examples for quick testing. The warnings about full evaluations are appropriate and clear.

</review_comment_end>

packages/nemo-evaluator-launcher/src/nemo_evaluator_launcher/executors/slurm/executor.py (3)

646-646: LGTM! Consistent health endpoint resolution.

The health check path resolution has been properly updated to use cfg.deployment.endpoints.get("health", "/health"), which aligns with the standardized endpoint configuration in nim.yaml.

</review_comment_end>

1266-1270: LGTM! Proper fallback chain for HAProxy health checks.

The health check path resolution correctly prioritizes the proxy-specific configuration first, then falls back to the deployment endpoints configuration. The comment clearly explains the fallback behavior.

</review_comment_end>

1305-1305: LGTM! Consistent endpoint usage in HAProxy config generation.

The health check path consistently uses cfg.deployment.endpoints.get("health", "/health"), maintaining the same pattern across all health check resolution points in the file.

</review_comment_end>

packages/nemo-evaluator-launcher/src/nemo_evaluator_launcher/configs/deployment/nim.yaml (2)

29-29: Health endpoint /v1/health/ready is correct for NIM containers.

The /v1/health/ready endpoint is the standard health check endpoint for NVIDIA NIM, and this configuration aligns with official documentation across all NIM variants.

21-24: Clarify the command override contradiction.

Line 21 adds an explicit command: /opt/nim/start_server.sh, but the comment on line 23 states "NIM containers use default entrypoint - no custom command needed." In Kubernetes, specifying a command field overrides the container's default entrypoint, making these statements contradictory. Either the command field is required and the comment should be updated, or the command field is unnecessary and should be removed to use the container's default entrypoint.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

AWarno added 2 commits December 18, 2025 13:07

fix(health-url): fix health url

461b65a

Signed-off-by: Anna Warno <[email protected]>

feat(nim): add nim

1b68ecf

Signed-off-by: Anna Warno <[email protected]>

AWarno requested review from a team as code owners December 18, 2025 13:29

github-actions bot added the nemo-evaluator-launcher label Dec 18, 2025

copy-pr-bot bot temporarily deployed to test December 18, 2025 13:31 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci December 18, 2025 13:31 Inactive

AWarno changed the title ~~Awarno/fix health url~~ fix(health-url): fix health url Dec 18, 2025

copy-pr-bot bot temporarily deployed to nemo-ci December 18, 2025 13:33 Inactive

marta-sd reviewed Dec 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(health-url): fix health url #578

fix(health-url): fix health url #578

Uh oh!

AWarno commented Dec 18, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

marta-sd Dec 18, 2025

Uh oh!

marta-sd Dec 18, 2025

Uh oh!

AWarno commented Dec 18, 2025

Uh oh!

coderabbitai bot commented Dec 18, 2025

Uh oh!

coderabbitai bot commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix(health-url): fix health url #578

Are you sure you want to change the base?

fix(health-url): fix health url #578

Uh oh!

Conversation

AWarno commented Dec 18, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

marta-sd Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

marta-sd Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

AWarno commented Dec 18, 2025

Uh oh!

coderabbitai bot commented Dec 18, 2025

Uh oh!

coderabbitai bot commented Dec 18, 2025

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AWarno commented Dec 18, 2025 •

edited by coderabbitai bot

Loading