diff --git a/docs/index.md b/docs/index.md
index 72a8b446..710d5140 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -67,13 +67,14 @@ End-to-end applications: RAG agents, ML agents, and multi-agent systems.
## Training Pipeline
-The Nemotron training pipeline follows a three-stage approach with full artifact lineage tracking:
+The Nemotron training pipeline follows a four-stage approach with full artifact lineage tracking:
| Stage | Name | Description |
|-------|------|-------------|
| 0 | [Pretraining](train/nano3/pretrain.md) | Base model training on large text corpus |
| 1 | [SFT](train/nano3/sft.md) | Supervised fine-tuning for instruction following |
| 2 | [RL](train/nano3/rl.md) | Reinforcement learning for alignment |
+| 3 | [Evaluation](train/nano3/eval.md) | Benchmark testing with NeMo Evaluator |
## Why Nemotron?
@@ -133,6 +134,7 @@ train/artifacts.md
train/nano3/pretrain.md
train/nano3/sft.md
train/nano3/rl.md
+train/nano3/eval.md
train/nano3/import.md
```
@@ -147,4 +149,5 @@ train/omegaconf.md
train/wandb.md
train/cli.md
train/data-prep.md
+train/evaluator.md
```
diff --git a/docs/train/cli.md b/docs/train/cli.md
index b7d40ec4..292ad25a 100644
--- a/docs/train/cli.md
+++ b/docs/train/cli.md
@@ -487,4 +487,5 @@ uv run nemotron myrecipe train -c tiny --run MY-CLUSTER
- [Data Preparation](./data-prep.md) — Data preparation module
- [Artifact Lineage](./artifacts.md) — W&B artifact system and lineage tracking
- [W&B Integration](./wandb.md) — Credentials and configuration
+- [Evaluation Framework](./evaluator.md) — Model evaluation with NeMo Evaluator
- [Nano3 Recipe](./nano3/README.md) — Complete training recipe example
diff --git a/docs/train/evaluator.md b/docs/train/evaluator.md
new file mode 100644
index 00000000..34c2f420
--- /dev/null
+++ b/docs/train/evaluator.md
@@ -0,0 +1,393 @@
+# Evaluation Framework
+
+The Nemotron evaluation framework provides model evaluation capabilities using [NeMo Evaluator](https://github.com/NVIDIA/nemo-evaluator-launcher), enabling benchmark testing of trained models on standard NLP tasks.
+
+
+
+```console
+$ uv run nemotron evaluate -c nemotron-3-nano-nemo-ray --run MY-CLUSTER
+Compiled Configuration
+╭──────────────────────────────────── run ─────────────────────────────────────╮
+│ wandb: │
+│ project: nemotron │
+│ entity: my-team │
+╰──────────────────────────────────────────────────────────────────────────────╯
+
+[info] Detected W&B login, setting WANDB_API_KEY
+
+Starting evaluation...
+✓ Evaluation submitted: 480d3c89bfe4a55c
+Check status: nemo-evaluator-launcher status 480d3c89bfe4a55c
+```
+
+
+
+## Overview
+
+The evaluation framework enables:
+
+- **Benchmark Testing** — Run standard benchmarks (MMLU, ARC, HellaSwag, etc.) on your models
+- **W&B Integration** — Auto-export results to Weights & Biases for tracking
+- **Slurm Execution** — Submit evaluation jobs to HPC clusters
+- **Auto-Squash** — Automatically converts Docker images to squashfs for Slurm clusters
+- **Credential Auto-Propagation** — Automatically passes W&B tokens to remote jobs
+
+The evaluator uses the same `env.toml` execution profiles as training recipes, providing a unified experience across all stages.
+
+## Quick Start
+
+```bash
+# Run evaluation on a cluster
+uv run nemotron evaluate -c nemotron-3-nano-nemo-ray --run MY-CLUSTER
+
+# Preview config without executing
+uv run nemotron evaluate -c nemotron-3-nano-nemo-ray --dry-run
+
+# Filter to specific tasks
+uv run nemotron evaluate -c nemotron-3-nano-nemo-ray --run MY-CLUSTER -t adlr_mmlu
+
+# Override checkpoint path
+uv run nemotron evaluate -c nemotron-3-nano-nemo-ray --run MY-CLUSTER \
+ deployment.checkpoint_path=/path/to/your/checkpoint
+```
+
+## CLI Options
+
+| Option | Short | Description |
+|--------|-------|-------------|
+| `--config` | `-c` | Config name or path |
+| `--run` | `-r` | Submit to cluster (attached, streams logs) |
+| `--batch` | `-b` | Submit to cluster (detached, exits immediately) |
+| `--dry-run` | `-d` | Preview config without executing |
+| `--task` | `-t` | Filter to specific task(s), can be repeated |
+| `--force-squash` | | Force re-squash even if cached |
+
+### Task Filtering
+
+Run specific benchmarks using the `-t` flag:
+
+```bash
+# Single task
+uv run nemotron evaluate -c config --run MY-CLUSTER -t adlr_mmlu
+
+# Multiple tasks
+uv run nemotron evaluate -c config --run MY-CLUSTER -t adlr_mmlu -t hellaswag
+```
+
+### Available Tasks
+
+Common evaluation tasks include:
+
+| Task | Description |
+|------|-------------|
+| `adlr_mmlu` | Massive Multitask Language Understanding |
+| `adlr_arc_challenge_llama_25_shot` | AI2 Reasoning Challenge |
+| `adlr_winogrande_5_shot` | Winograd Schema Challenge |
+| `hellaswag` | Commonsense reasoning |
+| `openbookqa` | Open-domain question answering |
+
+## Execution Profiles
+
+The evaluator uses the same `env.toml` profiles as training recipes. See [Execution through NeMo-Run](./nemo-run.md) for full documentation.
+
+### Basic Profile
+
+```toml
+# env.toml
+
+[wandb]
+project = "nemotron"
+entity = "my-team"
+
+[MY-CLUSTER]
+executor = "slurm"
+account = "my-account"
+partition = "batch"
+tunnel = "ssh"
+host = "cluster.example.com"
+user = "myuser"
+remote_job_dir = "/lustre/fsw/users/myuser/.nemotron"
+```
+
+### Profile with Auto-Squash
+
+Slurm clusters use [Pyxis](https://github.com/NVIDIA/pyxis) with enroot for container execution. While you can use Docker references directly, pre-squashed `.sqsh` files significantly speed up job startup by avoiding container pulls on each run.
+
+With SSH tunnel settings, the CLI can automatically create squash files from Docker references:
+
+```toml
+[MY-CLUSTER]
+executor = "slurm"
+account = "my-account"
+partition = "batch"
+
+# SSH settings (enables auto-squash)
+tunnel = "ssh"
+host = "cluster.example.com"
+user = "myuser"
+remote_job_dir = "/lustre/fsw/users/myuser/.nemotron"
+
+# Container settings - use Docker ref, auto-squashed on first run
+container_image = "nvcr.io/nvidia/nemo:25.01"
+```
+
+When you run with `--run MY-CLUSTER`, the CLI will:
+1. Detect that `deployment.image` is a Docker reference (not a `.sqsh` path)
+2. SSH to the cluster and run `enroot import` on a compute node
+3. Cache the `.sqsh` file in `${remote_job_dir}/containers/` for reuse
+4. Update the config to use the squashed path
+
+Subsequent runs reuse the cached squash file, eliminating container pull overhead.
+
+## Configuration
+
+Evaluation configs define how to deploy your model and which benchmarks to run.
+
+### Example Config
+
+```yaml
+# Execution (Slurm settings)
+execution:
+ type: slurm
+ hostname: ${run.env.host}
+ account: ${run.env.account}
+ partition: ${run.env.partition}
+ num_nodes: 1
+ gres: gpu:8
+
+ # Auto-export to W&B after evaluation
+ auto_export:
+ enabled: true
+ destinations:
+ - wandb
+
+# Deployment (Model serving)
+deployment:
+ type: generic
+ image: ${run.env.container} # Docker image or .sqsh path
+ checkpoint_path: /path/to/checkpoint
+ command: >-
+ python deploy_ray_inframework.py
+ --megatron_checkpoint /checkpoint/
+ --num_gpus 8
+
+# Evaluation (Tasks to run)
+evaluation:
+ tasks:
+ - name: adlr_mmlu
+ - name: hellaswag
+ - name: openbookqa
+
+# Export (W&B settings)
+export:
+ wandb:
+ entity: ${run.wandb.entity}
+ project: ${run.wandb.project}
+```
+
+### Key Sections
+
+| Section | Purpose |
+|---------|---------|
+| `run.env` | Environment settings from env.toml (cluster, container) |
+| `run.wandb` | W&B settings from env.toml `[wandb]` section |
+| `execution` | Slurm executor configuration (nodes, GPUs, account) |
+| `deployment` | Model deployment (container, checkpoint, command) |
+| `evaluation` | Tasks and evaluation parameters |
+| `export` | Result export destinations (W&B) |
+
+## Auto-Squash
+
+For Slurm clusters that require squashfs containers, the evaluator automatically converts Docker images to `.sqsh` files—the same behavior as training recipes.
+
+### How It Works
+
+1. **Detection** — CLI checks if `deployment.image` is a Docker reference (not already `.sqsh`)
+2. **SSH Connection** — Connects to cluster via SSH tunnel (using `host` and `user` from env.toml)
+3. **Squash** — Runs `enroot import` on a compute node to create the `.sqsh` file
+4. **Cache** — Stores the squash file in `${remote_job_dir}/containers/` for reuse
+5. **Config Update** — Rewrites `deployment.image` to use the squashed path
+
+### Usage
+
+```bash
+# Auto-squash happens automatically for Docker refs
+uv run nemotron evaluate -c config --run MY-CLUSTER
+
+# Force re-squash (ignores cache)
+uv run nemotron evaluate -c config --run MY-CLUSTER --force-squash
+
+# Already-squashed paths skip the step
+# (if deployment.image ends in .sqsh, no squashing needed)
+```
+
+### Requirements
+
+Auto-squash requires these settings in your `env.toml` profile:
+
+| Field | Required | Description |
+|-------|----------|-------------|
+| `executor` | Yes | Must be `"slurm"` |
+| `tunnel` | Yes | Must be `"ssh"` |
+| `host` | Yes | SSH hostname (e.g., `cluster.example.com`) |
+| `user` | No | SSH username (defaults to current user) |
+| `remote_job_dir` | Yes | Remote directory for job files and squash cache |
+
+## W&B Integration
+
+The evaluator automatically propagates W&B credentials when you're logged in locally—the same behavior as training recipes.
+
+### Setup
+
+1. **Login to W&B locally:**
+ ```bash
+ wandb login
+ ```
+
+2. **Configure env.toml** (same `[wandb]` section used by all recipes):
+ ```toml
+ [wandb]
+ project = "nemotron"
+ entity = "my-team"
+ ```
+
+3. **Run evaluation** — credentials are automatically passed:
+ ```bash
+ uv run nemotron evaluate -c config --run MY-CLUSTER
+ # [info] Detected W&B login, setting WANDB_API_KEY
+ ```
+
+### What Gets Propagated
+
+| Variable | Source | Description |
+|----------|--------|-------------|
+| `WANDB_API_KEY` | Local wandb login | Auto-detected via `wandb.api.api_key` |
+| `WANDB_PROJECT` | `env.toml [wandb]` | Project name for result tracking |
+| `WANDB_ENTITY` | `env.toml [wandb]` | Team/user entity |
+
+## Monitoring Jobs
+
+### Check Status
+
+```bash
+# Using nemo-evaluator-launcher directly
+nemo-evaluator-launcher status INVOCATION_ID
+
+# Check Slurm queue
+ssh cluster squeue -u $USER
+```
+
+### Stream Logs
+
+```bash
+nemo-evaluator-launcher logs INVOCATION_ID
+```
+
+### Cancel Jobs
+
+```bash
+# Cancel via Slurm
+ssh cluster scancel JOB_ID
+
+# Or multiple jobs
+ssh cluster "scancel JOB_ID1 JOB_ID2 JOB_ID3"
+```
+
+## Creating Custom Configs
+
+### Step 1: Create Config File
+
+```yaml
+# src/nemotron/recipes/evaluator/config/my-model.yaml
+
+defaults:
+ - execution: slurm/default
+ - deployment: generic
+ - _self_
+
+run:
+ env:
+ container: nvcr.io/nvidia/nemo:25.01 # Docker ref (auto-squashed)
+ # OR: container: /path/to/container.sqsh # Pre-squashed
+ wandb:
+ entity: null # Populated from env.toml
+ project: null
+
+execution:
+ type: slurm
+ hostname: ${run.env.host}
+ account: ${run.env.account}
+ num_nodes: 1
+ gres: gpu:8
+
+ auto_export:
+ enabled: true
+ destinations:
+ - wandb
+
+deployment:
+ type: generic
+ image: ${run.env.container}
+ checkpoint_path: /path/to/your/model/checkpoint
+ command: >-
+ python deploy_script.py --checkpoint /checkpoint/
+
+evaluation:
+ tasks:
+ - name: adlr_mmlu
+ - name: hellaswag
+
+export:
+ wandb:
+ entity: ${run.wandb.entity}
+ project: ${run.wandb.project}
+```
+
+### Step 2: Run Evaluation
+
+```bash
+uv run nemotron evaluate -c my-model --run MY-CLUSTER
+```
+
+## Troubleshooting
+
+### "Missing key type" Error
+
+Ensure your config has all required Slurm fields:
+
+```yaml
+execution:
+ type: slurm # Required
+ ntasks_per_node: 1 # Required
+ gres: gpu:8 # Required
+```
+
+### W&B Credentials Not Detected
+
+1. Verify you're logged in: `wandb login`
+2. Check env.toml has `[wandb]` section
+3. Look for `[info] Detected W&B login` message
+
+### Auto-Squash Not Working
+
+1. Verify `tunnel = "ssh"` in your env.toml profile
+2. Check `host` and `remote_job_dir` are set
+3. Ensure `nemo-run` is installed: `pip install nemo-run`
+
+### Jobs Stuck in PENDING
+
+Check queue status:
+```bash
+ssh cluster "squeue -p batch | head"
+```
+
+Common reasons:
+- `(Priority)` — Waiting for resources
+- `(Resources)` — Insufficient available nodes
+- `(QOSMaxJobsPerUserLimit)` — User job limit reached
+
+## Further Reading
+
+- [Execution through NeMo-Run](./nemo-run.md) — Execution profiles and env.toml
+- [W&B Integration](./wandb.md) — Credentials and artifact tracking
+- [NeMo Evaluator Documentation](https://github.com/NVIDIA/nemo-evaluator-launcher) — Launcher reference
diff --git a/docs/train/nano3/README.md b/docs/train/nano3/README.md
index 85632169..fefc59cb 100644
--- a/docs/train/nano3/README.md
+++ b/docs/train/nano3/README.md
@@ -55,6 +55,9 @@ $ uv run nemotron nano3 sft --run YOUR-CLUSTER
// Stage 2: Reinforcement Learning
$ uv run nemotron nano3 data prep rl --run YOUR-CLUSTER
$ uv run nemotron nano3 rl --run YOUR-CLUSTER
+
+// Stage 3: Evaluation
+$ uv run nemotron nano3 eval --run YOUR-CLUSTER
```
@@ -78,6 +81,7 @@ $ uv run nemotron nano3 rl --run YOUR-CLUSTER
| 0 | [Pretraining](./pretrain.md) | Base model on 25T tokens with curriculum learning | [pretrain.md](./pretrain.md) |
| 1 | [SFT](./sft.md) | Multi-domain instruction tuning with 12+ data sources | [sft.md](./sft.md) |
| 2 | [RL](./rl.md) | GRPO alignment with multi-environment rewards | [rl.md](./rl.md) |
+| 3 | [Evaluation](./eval.md) | Benchmark testing with NeMo Evaluator | [eval.md](./eval.md) |
## Model Specifications
@@ -111,6 +115,12 @@ Multi-environment RLVR training across 7 reward environments using GRPO, plus Ge
→ [RL Guide](./rl.md)
+### Stage 3: Evaluation
+
+Benchmark testing on standard NLP tasks (MMLU, HellaSwag, ARC) using NeMo Evaluator, with automatic result export to W&B.
+
+→ [Evaluation Guide](./eval.md)
+
## Execution Options
All commands support [NeMo-Run](../nemo-run.md) execution modes:
@@ -148,9 +158,15 @@ flowchart TB
cmd2 --> model2["ModelArtifact-rl
(Final Model)"]
end
+ subgraph eval["Stage 3: Evaluation"]
+ model2 --> cmd3["uv run nemotron nano3 eval"]
+ cmd3 --> results["Benchmark Results
(W&B)"]
+ end
+
style pretrain fill:#e1f5fe,stroke:#2196f3
style sft fill:#f3e5f5,stroke:#9c27b0
style rl fill:#e8f5e9,stroke:#4caf50
+ style eval fill:#fff3e0,stroke:#ff9800
```
→ [Artifact Lineage & W&B Integration](../artifacts.md)
@@ -168,9 +184,8 @@ Native integrations with NVIDIA's NeMo ecosystem:
| [NeMo Curator](https://github.com/NVIDIA-NeMo/Curator) | Scalable data curation—deduplication, quality filtering, PII removal | Planned |
| [NeMo Data Designer](https://github.com/NVIDIA-NeMo/DataDesigner) | Synthetic data generation for instruction tuning and alignment | Planned |
| [NeMo Export-Deploy](https://github.com/NVIDIA-NeMo/Export-Deploy) | Model export to TensorRT-LLM and deployment | Planned |
-| [NeMo Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) | Comprehensive model evaluation and benchmarking | Planned |
-These integrations will enable end-to-end pipelines from data curation to model evaluation.
+These integrations will enable end-to-end pipelines from data curation to deployment.
## CLI Reference
@@ -191,6 +206,7 @@ Usage: nemotron nano3 [OPTIONS] COMMAND [ARGS]...
│ pretrain Run pretraining with Megatron-Bridge (stage0). │
│ sft Run supervised fine-tuning with Megatron-Bridge (stage1). │
│ rl Run reinforcement learning with NeMo-RL GRPO (stage2). │
+│ eval Run evaluation with NeMo-Evaluator (stage3). │
╰──────────────────────────────────────────────────────────────────────────╯
// View training command help (SFT example with artifact overrides)
@@ -255,6 +271,7 @@ wandb login
- [Stage 0: Pretraining](./pretrain.md)
- [Stage 1: SFT](./sft.md)
- [Stage 2: RL](./rl.md)
+- [Stage 3: Evaluation](./eval.md)
- [Importing Models & Data](./import.md)
- [Artifact Lineage](../artifacts.md)
- [Execution through NeMo-Run](../nemo-run.md)
diff --git a/docs/train/nano3/eval.md b/docs/train/nano3/eval.md
new file mode 100644
index 00000000..f92c5d7d
--- /dev/null
+++ b/docs/train/nano3/eval.md
@@ -0,0 +1,267 @@
+# Stage 3: Evaluation
+
+This stage evaluates trained models using [NeMo Evaluator](https://github.com/NVIDIA/nemo-evaluator-launcher), running standard NLP benchmarks to measure model capabilities.
+
+---
+
+## Quick Start
+
+
+
+```console
+// Run evaluation on the RL model (default)
+$ uv run nemotron nano3 eval --run YOUR-CLUSTER
+
+// Evaluate a specific model stage
+$ uv run nemotron nano3 eval --run YOUR-CLUSTER run.model=sft:latest
+
+// Run specific benchmarks only
+$ uv run nemotron nano3 eval --run YOUR-CLUSTER -t adlr_mmlu -t hellaswag
+
+// Preview config without executing
+$ uv run nemotron nano3 eval --dry-run
+```
+
+
+
+> **Note**: The `--run YOUR-CLUSTER` flag submits jobs via [NeMo-Run](../nemo-run.md). See [Execution through NeMo-Run](../nemo-run.md) for setup.
+
+---
+
+## CLI Command
+
+```bash
+uv run nemotron nano3 eval [options] [overrides...]
+```
+
+| Option | Short | Description |
+|--------|-------|-------------|
+| `--run ` | `-r` | Submit to cluster (attached—waits, streams logs) |
+| `--batch ` | `-b` | Submit to cluster (detached—submits and exits) |
+| `--dry-run` | `-d` | Preview config without executing |
+| `--task ` | `-t` | Filter to specific task(s), can be repeated |
+| `--force-squash` | | Force re-squash of container image |
+| `key=value` | | Override config values |
+
+### Task Filtering
+
+Run specific benchmarks using the `-t` flag:
+
+```bash
+# Single task
+uv run nemotron nano3 eval --run YOUR-CLUSTER -t adlr_mmlu
+
+# Multiple tasks
+uv run nemotron nano3 eval --run YOUR-CLUSTER -t adlr_mmlu -t hellaswag -t arc_challenge
+```
+
+### Model Selection
+
+By default, evaluation runs on the RL stage output (`run.model=rl:latest`). Override to evaluate other stages:
+
+```bash
+# Evaluate SFT model
+uv run nemotron nano3 eval --run YOUR-CLUSTER run.model=sft:latest
+
+# Evaluate pretrained model
+uv run nemotron nano3 eval --run YOUR-CLUSTER run.model=pretrain:latest
+
+# Evaluate specific version
+uv run nemotron nano3 eval --run YOUR-CLUSTER run.model=rl:v2
+```
+
+---
+
+## Available Benchmarks
+
+The default configuration includes these tasks:
+
+| Task | Description |
+|------|-------------|
+| `adlr_mmlu` | Massive Multitask Language Understanding |
+| `hellaswag` | Commonsense reasoning |
+| `arc_challenge` | AI2 Reasoning Challenge |
+
+Additional tasks available in NeMo Evaluator:
+
+| Task | Description |
+|------|-------------|
+| `adlr_arc_challenge_llama_25_shot` | ARC Challenge (25-shot) |
+| `adlr_winogrande_5_shot` | Winograd Schema Challenge |
+| `openbookqa` | Open-domain question answering |
+| `truthfulqa` | Truthfulness evaluation |
+| `gsm8k` | Grade school math |
+
+See [NeMo Evaluator](https://github.com/NVIDIA/nemo-evaluator-launcher) for the full list of available tasks.
+
+---
+
+## Configuration
+
+Evaluation configs define how to deploy your model and which benchmarks to run.
+
+| File | Purpose |
+|------|---------|
+| `config/default.yaml` | Production configuration with vLLM deployment |
+
+### Key Configuration Sections
+
+```yaml
+# Model to evaluate (W&B artifact reference)
+run:
+ model: rl:latest # Options: pretrain, sft, rl
+
+# Deployment (model serving)
+deployment:
+ type: vllm
+ tensor_parallel_size: 4
+ data_parallel_size: 1
+ extra_args: "--max-model-len 32768"
+
+# Tasks to run
+evaluation:
+ tasks:
+ - name: adlr_mmlu
+ - name: hellaswag
+ - name: arc_challenge
+
+# W&B export for results
+export:
+ wandb:
+ entity: ${run.wandb.entity}
+ project: ${run.wandb.project}
+```
+
+### Override Examples
+
+```bash
+# Different tensor parallelism
+uv run nemotron nano3 eval --run YOUR-CLUSTER deployment.tensor_parallel_size=8
+
+# Limit samples for quick testing
+uv run nemotron nano3 eval --run YOUR-CLUSTER \
+ evaluation.nemo_evaluator_config.config.params.limit_samples=10
+```
+
+---
+
+## Running with NeMo-Run
+
+The evaluator uses the same `env.toml` profiles as training stages, providing a unified experience across the pipeline.
+
+```toml
+[wandb]
+project = "nemotron"
+entity = "YOUR-TEAM"
+
+[YOUR-CLUSTER]
+executor = "slurm"
+account = "YOUR-ACCOUNT"
+partition = "batch"
+tunnel = "ssh"
+host = "cluster.example.com"
+user = "myuser"
+remote_job_dir = "/lustre/fsw/users/myuser/.nemotron"
+```
+
+See [Execution through NeMo-Run](../nemo-run.md) for complete configuration options.
+
+### W&B Integration
+
+Results are automatically exported to Weights & Biases when:
+1. You're logged in locally (`wandb login`)
+2. `[wandb]` section is configured in `env.toml`
+
+```bash
+# Verify W&B login
+wandb login
+
+# Run evaluation—results auto-export to W&B
+uv run nemotron nano3 eval --run YOUR-CLUSTER
+# [info] Detected W&B login, setting WANDB_API_KEY
+```
+
+---
+
+## Artifact Lineage
+
+Evaluation connects to the training pipeline through [W&B Artifacts](../artifacts.md):
+
+```mermaid
+%%{init: {'theme': 'base', 'themeVariables': { 'primaryBorderColor': '#333333', 'lineColor': '#333333', 'primaryTextColor': '#333333'}}}%%
+flowchart LR
+ model0["ModelArtifact-pretrain"] --> eval0["eval"]
+ model1["ModelArtifact-sft"] --> eval1["eval"]
+ model2["ModelArtifact-rl"] --> eval2["eval"]
+
+ eval0 --> results0["Benchmark Results"]
+ eval1 --> results1["Benchmark Results"]
+ eval2 --> results2["Benchmark Results"]
+
+ results0 --> wandb["W&B Dashboard"]
+ results1 --> wandb
+ results2 --> wandb
+
+ style model0 fill:#e1f5fe,stroke:#2196f3
+ style model1 fill:#f3e5f5,stroke:#9c27b0
+ style model2 fill:#e8f5e9,stroke:#4caf50
+ style wandb fill:#fff3e0,stroke:#ff9800
+```
+
+---
+
+## Monitoring Jobs
+
+### Check Status
+
+```bash
+# Using nemo-evaluator-launcher
+nemo-evaluator-launcher status INVOCATION_ID
+
+# Check Slurm queue
+ssh cluster squeue -u $USER
+```
+
+### Stream Logs
+
+```bash
+nemo-evaluator-launcher logs INVOCATION_ID
+```
+
+---
+
+## Troubleshooting
+
+### W&B Credentials Not Detected
+
+1. Verify you're logged in: `wandb login`
+2. Check env.toml has `[wandb]` section
+3. Look for `[info] Detected W&B login` message
+
+### Model Artifact Not Found
+
+Verify the artifact exists in W&B:
+```bash
+# Check available artifacts
+wandb artifact ls YOUR-ENTITY/YOUR-PROJECT
+```
+
+### Evaluation Times Out
+
+Increase the timeout in your config:
+```bash
+uv run nemotron nano3 eval --run YOUR-CLUSTER \
+ evaluation.nemo_evaluator_config.config.params.request_timeout=7200
+```
+
+---
+
+## Reference
+
+- [Evaluation Framework](../evaluator.md) — Full evaluator documentation
+- [NeMo Evaluator Documentation](https://github.com/NVIDIA/nemo-evaluator-launcher) — Launcher reference
+- [Artifact Lineage](../artifacts.md) — W&B artifact system
+- [Execution through NeMo-Run](../nemo-run.md) — Execution profiles
+- [W&B Integration](../wandb.md) — Credentials and configuration
+- **Recipe Source**: `src/nemotron/recipes/nano3/stage3_eval/` — Implementation details
+- [Back to Overview](./README.md)
diff --git a/pyproject.toml b/pyproject.toml
index 2d16415c..7cc2c625 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -49,6 +49,7 @@ wandb = ["wandb>=0.15.0"]
s3 = ["s3fs>=2024.0.0"]
gcs = ["gcsfs>=2024.0.0"]
sentencepiece = ["sentencepiece>=0.2.0"]
+evaluator = ["nemo-evaluator-launcher>=0.1.0"]
dev = [
"pytest>=7.0.0",
"pytest-cov>=4.0.0",
@@ -60,6 +61,7 @@ all = [
"s3fs>=2024.0.0",
"gcsfs>=2024.0.0",
"sentencepiece>=0.2.0",
+ "nemo-evaluator-launcher>=0.1.0",
]
# Note: megatron-bridge is required for training but not listed as a dependency
diff --git a/src/nemotron/cli/bin/nemotron.py b/src/nemotron/cli/bin/nemotron.py
index a595e5cc..580d7c3b 100644
--- a/src/nemotron/cli/bin/nemotron.py
+++ b/src/nemotron/cli/bin/nemotron.py
@@ -87,8 +87,23 @@ def _register_groups() -> None:
app.add_typer(kit_app, name="kit")
-# Register groups on import
+def _register_commands() -> None:
+ """Register top-level commands with the main app."""
+ from nemotron.cli.evaluate import evaluate
+
+ # Register evaluate command with same context settings as recipe commands
+ app.command(
+ name="evaluate",
+ context_settings={
+ "allow_extra_args": True,
+ "ignore_unknown_options": True,
+ },
+ )(evaluate)
+
+
+# Register groups and commands on import
_register_groups()
+_register_commands()
def main() -> None:
diff --git a/src/nemotron/cli/evaluate.py b/src/nemotron/cli/evaluate.py
new file mode 100644
index 00000000..d55725d7
--- /dev/null
+++ b/src/nemotron/cli/evaluate.py
@@ -0,0 +1,64 @@
+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Top-level evaluate command.
+
+Provides a generic `nemotron evaluate` command with pre-built configs for
+common evaluation scenarios. Unlike recipe-specific commands (nano3/eval),
+this command has no default config and requires explicit config selection.
+"""
+
+from __future__ import annotations
+
+import typer
+
+from nemotron.kit.cli.evaluator import evaluator
+
+# Config directory for generic evaluator configs
+CONFIG_DIR = "src/nemotron/recipes/evaluator/config"
+
+
+@evaluator(
+ name="evaluate",
+ config_dir=CONFIG_DIR,
+ default_config="default",
+ require_explicit_config=True,
+)
+def evaluate(ctx: typer.Context) -> None:
+ """Run model evaluation with nemo-evaluator.
+
+ Generic evaluation command with pre-built configs for common models.
+ For recipe-specific evaluation with artifact resolution, use `nemotron nano3 eval`.
+
+ Available configs:
+ nemotron-3-nano-nemo-ray NeMo Framework Ray deployment for Nemotron-3-Nano
+
+ Examples:
+ # Evaluate Nemotron-3-Nano with NeMo Ray deployment
+ nemotron evaluate -c nemotron-3-nano-nemo-ray --run MY-CLUSTER
+
+ # Override checkpoint path
+ nemotron evaluate -c nemotron-3-nano-nemo-ray --run MY-CLUSTER \\
+ deployment.checkpoint_path=/path/to/checkpoint
+
+ # Filter specific tasks
+ nemotron evaluate -c nemotron-3-nano-nemo-ray --run MY-CLUSTER -t adlr_mmlu
+
+ # Dry run (preview config)
+ nemotron evaluate -c nemotron-3-nano-nemo-ray --run MY-CLUSTER --dry-run
+
+ # Use custom config file
+ nemotron evaluate -c /path/to/custom.yaml --run MY-CLUSTER
+ """
+ ...
diff --git a/src/nemotron/cli/nano3/app.py b/src/nemotron/cli/nano3/app.py
index 9e13b79f..052c73ed 100644
--- a/src/nemotron/cli/nano3/app.py
+++ b/src/nemotron/cli/nano3/app.py
@@ -22,13 +22,13 @@
import typer
from nemotron.cli.nano3.data import data_app
-from nemotron.cli.nano3.help import RecipeCommand, make_recipe_command
+from nemotron.cli.nano3.eval import eval as eval_cmd
+from nemotron.cli.nano3.help import make_recipe_command
from nemotron.cli.nano3.model import model_app
from nemotron.cli.nano3.pretrain import pretrain
from nemotron.cli.nano3.rl import rl
from nemotron.cli.nano3.sft import sft
-
# Create nano3 app
nano3_app = typer.Typer(
name="nano3",
@@ -91,3 +91,21 @@
config_dir="src/nemotron/recipes/nano3/stage2_rl/config",
),
)(rl)
+
+# Eval has model artifact override (evaluates trained model)
+# Note: supports_stage=False because evaluator doesn't use nemo-run staging
+nano3_app.command(
+ name="eval",
+ context_settings={
+ "allow_extra_args": True,
+ "ignore_unknown_options": True,
+ },
+ rich_help_panel="Training Stages",
+ cls=make_recipe_command(
+ artifact_overrides={
+ "model": "Model checkpoint artifact to evaluate",
+ },
+ config_dir="src/nemotron/recipes/nano3/stage3_eval/config",
+ supports_stage=False,
+ ),
+)(eval_cmd)
diff --git a/src/nemotron/cli/nano3/eval.py b/src/nemotron/cli/nano3/eval.py
new file mode 100644
index 00000000..4151ac33
--- /dev/null
+++ b/src/nemotron/cli/nano3/eval.py
@@ -0,0 +1,53 @@
+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Eval command implementation for nano3 recipe (stage3)."""
+
+from __future__ import annotations
+
+import typer
+
+from nemotron.kit.cli.evaluator import evaluator
+
+CONFIG_DIR = "src/nemotron/recipes/nano3/stage3_eval/config"
+
+
+@evaluator(
+ name="nano3/eval",
+ config_dir=CONFIG_DIR,
+ default_config="default",
+)
+def eval(ctx: typer.Context) -> None:
+ """Run evaluation with NeMo-Evaluator (stage3).
+
+ Evaluates the trained model using nemo-evaluator-launcher.
+ By default, evaluates the RL stage output (run.model=rl:latest).
+
+ Examples:
+ # Eval on cluster (loads env.toml profile)
+ nemotron nano3 eval --run MY-CLUSTER
+
+ # Override model artifact
+ nemotron nano3 eval --run MY-CLUSTER run.model=sft:v2
+
+ # Filter specific tasks
+ nemotron nano3 eval --run MY-CLUSTER -t adlr_mmlu -t hellaswag
+
+ # Dry run (show resolved config without executing)
+ nemotron nano3 eval --run MY-CLUSTER --dry-run
+
+ # Local execution
+ nemotron nano3 eval execution.type=local
+ """
+ ...
diff --git a/src/nemotron/cli/nano3/help.py b/src/nemotron/cli/nano3/help.py
index 2e9c5693..9e005bfa 100644
--- a/src/nemotron/cli/nano3/help.py
+++ b/src/nemotron/cli/nano3/help.py
@@ -84,10 +84,12 @@ class RecipeCommand(TyperCommand):
artifact_overrides: Dict mapping artifact names to descriptions.
Example: {"data": "Data artifact", "model": "Model checkpoint"}
config_dir: Path to config directory (relative to repo root).
+ supports_stage: Whether this command supports --stage option.
"""
artifact_overrides: ClassVar[dict[str, str]] = {}
config_dir: ClassVar[str | None] = None
+ supports_stage: ClassVar[bool] = True
def format_help(self, ctx, formatter):
"""Format help with custom recipe options section."""
@@ -115,7 +117,8 @@ def format_help(self, ctx, formatter):
options_table.add_row("-r, --run PROFILE", "Submit to cluster (attached)")
options_table.add_row("-b, --batch PROFILE", "Submit to cluster (detached)")
options_table.add_row("-d, --dry-run", "Preview config without execution")
- options_table.add_row("--stage", "Stage files for interactive debugging")
+ if self.supports_stage:
+ options_table.add_row("--stage", "Stage files for interactive debugging")
console.print(
Panel(
@@ -227,6 +230,7 @@ def format_help(self, ctx, formatter):
def make_recipe_command(
artifact_overrides: dict[str, str] | None = None,
config_dir: str | None = None,
+ supports_stage: bool = True,
):
"""Factory function to create a RecipeCommand subclass with custom options.
@@ -234,6 +238,7 @@ def make_recipe_command(
artifact_overrides: Dict mapping artifact names to descriptions.
Example: {"data": "Data artifact", "model": "Model checkpoint"}
config_dir: Path to config directory (relative to repo root).
+ supports_stage: Whether this command supports --stage option.
Returns:
A RecipeCommand subclass with the specified options.
@@ -244,4 +249,5 @@ class CustomRecipeCommand(RecipeCommand):
CustomRecipeCommand.artifact_overrides = artifact_overrides or {}
CustomRecipeCommand.config_dir = config_dir
+ CustomRecipeCommand.supports_stage = supports_stage
return CustomRecipeCommand
diff --git a/src/nemotron/kit/cli/__init__.py b/src/nemotron/kit/cli/__init__.py
index e9b48156..0d63756e 100644
--- a/src/nemotron/kit/cli/__init__.py
+++ b/src/nemotron/kit/cli/__init__.py
@@ -17,11 +17,13 @@
This module provides shared CLI infrastructure built on Typer + OmegaConf.
"""
+from nemotron.kit.cli.evaluator import evaluator
from nemotron.kit.cli.globals import GlobalContext, global_callback
from nemotron.kit.cli.recipe import recipe
__all__ = [
"GlobalContext",
+ "evaluator",
"global_callback",
"recipe",
]
diff --git a/src/nemotron/kit/cli/evaluator.py b/src/nemotron/kit/cli/evaluator.py
new file mode 100644
index 00000000..d43d1247
--- /dev/null
+++ b/src/nemotron/kit/cli/evaluator.py
@@ -0,0 +1,613 @@
+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""@evaluator decorator for evaluation commands.
+
+Reuses ConfigBuilder from recipe infrastructure for consistent config handling,
+but executes via nemo-evaluator-launcher instead of nemo-run.
+"""
+
+from __future__ import annotations
+
+import os
+import sys
+from collections.abc import Callable
+from dataclasses import dataclass
+from functools import wraps
+from pathlib import Path
+from typing import Any
+
+import typer
+from rich.console import Console
+
+from nemotron.kit.cli.config import ConfigBuilder, generate_job_dir
+from nemotron.kit.cli.display import display_job_config, display_job_submission
+from nemotron.kit.cli.env import get_wandb_config
+from nemotron.kit.cli.globals import GlobalContext, split_unknown_args
+from nemotron.kit.cli.utils import resolve_run_interpolations
+
+console = Console()
+
+
+@dataclass
+class EvaluatorMetadata:
+ """Metadata attached to an evaluator command function.
+
+ Attributes:
+ name: Evaluator identifier (e.g., "nano3/eval")
+ config_dir: Path to config directory relative to repo root
+ default_config: Default config name (default: "default")
+ require_explicit_config: If True, requires -c/--config to be provided
+ """
+
+ name: str
+ config_dir: str
+ default_config: str = "default"
+ require_explicit_config: bool = False
+
+
+def evaluator(
+ name: str,
+ config_dir: str,
+ default_config: str = "default",
+ *,
+ require_explicit_config: bool = False,
+) -> Callable:
+ """Decorator marking a function as an evaluator command.
+
+ Similar to @recipe but executes via nemo-evaluator-launcher.
+ Supports --run/--batch for cluster execution, local execution when no profile.
+
+ Args:
+ name: Evaluator identifier (e.g., "nano3/eval")
+ config_dir: Path to config directory
+ (e.g., "src/nemotron/recipes/nano3/stage3_eval/config")
+ default_config: Default config name (stem) or path used when -c/--config
+ is not provided (default: "default").
+ require_explicit_config: If True, requires -c/--config to be provided.
+ Used for top-level `nemotron evaluate` command.
+
+ Example:
+ @evaluator(
+ name="nano3/eval",
+ config_dir="src/nemotron/recipes/nano3/stage3_eval/config",
+ )
+ def eval(ctx: typer.Context):
+ '''Run evaluation with NeMo-Evaluator (stage3).'''
+ ...
+ """
+
+ def decorator(func: Callable) -> Callable:
+ @wraps(func)
+ def wrapper(ctx: typer.Context) -> None:
+ # Get global context
+ global_ctx: GlobalContext = ctx.obj
+ if global_ctx is None:
+ global_ctx = GlobalContext()
+
+ # Split unknown args into dotlist and passthrough
+ # Also extract any global options that appear after the subcommand
+ dotlist, passthrough, global_ctx = split_unknown_args(ctx.args or [], global_ctx)
+ global_ctx.dotlist = dotlist
+ global_ctx.passthrough = passthrough
+
+ # Validate options after split_unknown_args has extracted all global options
+ if global_ctx.run and global_ctx.batch:
+ typer.echo("Error: --run and --batch cannot both be set", err=True)
+ raise typer.Exit(1)
+
+ # --stage is not supported for evaluator
+ if global_ctx.stage:
+ typer.echo("Error: --stage is not supported for evaluator commands", err=True)
+ raise typer.Exit(1)
+
+ # Check if explicit config is required
+ if require_explicit_config and not global_ctx.config:
+ typer.echo(
+ "Error: -c/--config is required for this command.\n"
+ "Example: nemotron evaluate -c /path/to/eval.yaml --run CLUSTER",
+ err=True,
+ )
+ raise typer.Exit(1)
+
+ # Build configuration (reuses ConfigBuilder)
+ builder = ConfigBuilder(
+ recipe_name=name,
+ script_path="", # Not used for evaluator
+ config_dir=config_dir,
+ default_config=default_config,
+ ctx=global_ctx,
+ argv=sys.argv,
+ )
+
+ # Load and merge config
+ builder.load_and_merge()
+
+ # Build full job config
+ builder.build_job_config()
+
+ # Auto-inject W&B env mappings if W&B export is configured
+ # This mirrors nemo-run's behavior of auto-passing WANDB_API_KEY
+ if _needs_wandb(builder.job_config):
+ _inject_wandb_env_mappings(builder.job_config)
+
+ # Auto-squash container images for Slurm execution
+ # This mirrors nemo-run's behavior of auto-squashing Docker images
+ _maybe_auto_squash_evaluator(builder.job_config, global_ctx)
+
+ # Display compiled configuration
+ # Show resolved paths for remote execution (--run/--batch)
+ for_remote = global_ctx.mode in ("run", "batch")
+ display_job_config(builder.job_config, for_remote=for_remote)
+
+ # Handle dry-run mode
+ if global_ctx.dry_run:
+ return
+
+ # Save configs (job.yaml for provenance, eval.yaml for launcher)
+ job_path, eval_path = _save_eval_configs(builder, for_remote=for_remote)
+
+ # Display job submission summary
+ display_job_submission(job_path, eval_path, {}, global_ctx.mode)
+
+ # Execute via evaluator launcher
+ _execute_evaluator(
+ job_config=builder.job_config,
+ passthrough=passthrough,
+ )
+
+ # Attach metadata to function for introspection
+ wrapper._evaluator_metadata = EvaluatorMetadata(
+ name=name,
+ config_dir=config_dir,
+ default_config=default_config,
+ require_explicit_config=require_explicit_config,
+ )
+
+ return wrapper
+
+ return decorator
+
+
+def _save_eval_configs(
+ builder: ConfigBuilder,
+ *,
+ for_remote: bool = False,
+) -> tuple[Path, Path]:
+ """Save job and eval configs to disk.
+
+ Args:
+ builder: ConfigBuilder with loaded configuration
+ for_remote: If True, rewrite paths for remote execution
+
+ Returns:
+ Tuple of (job_yaml_path, eval_yaml_path)
+ """
+ from omegaconf import OmegaConf
+
+ from nemotron.kit.cli.utils import rewrite_paths_for_remote
+
+ job_config = builder.job_config
+ job_dir = generate_job_dir(builder.recipe_name)
+
+ # Extract eval config (everything except 'run' section, with ${run.*} resolved)
+ config_dict = OmegaConf.to_container(job_config, resolve=False)
+ run_section = config_dict.pop("run", {})
+
+ # Rewrite paths for remote execution if needed
+ if for_remote:
+ repo_root = Path.cwd()
+ config_dict = rewrite_paths_for_remote(config_dict, repo_root)
+
+ # Resolve ${run.*} interpolations (${run.env.host}, ${run.wandb.entity}, etc.)
+ config_dict = resolve_run_interpolations(config_dict, run_section)
+
+ eval_config = OmegaConf.create(config_dict)
+
+ # Save configs
+ job_dir.mkdir(parents=True, exist_ok=True)
+
+ job_path = job_dir / "job.yaml"
+ eval_path = job_dir / "eval.yaml"
+
+ OmegaConf.save(job_config, job_path)
+ OmegaConf.save(eval_config, eval_path)
+
+ return job_path, eval_path
+
+
+def _execute_evaluator(
+ job_config: Any,
+ passthrough: list[str],
+) -> None:
+ """Execute evaluation via nemo-evaluator-launcher.
+
+ 1. Ensure W&B env vars are set (needed for artifact resolution)
+ 2. Resolve artifacts (${art:model,path})
+ 3. Extract evaluator config (everything except 'run' section)
+ 4. Call run_eval() with fully resolved config
+
+ Args:
+ job_config: Full job configuration
+ passthrough: Passthrough arguments (for -t/--task flags)
+ """
+ from omegaconf import OmegaConf
+
+ from nemotron.kit.resolvers import (
+ clear_artifact_cache,
+ register_resolvers_from_config,
+ )
+
+ # Ensure W&B host env vars BEFORE artifact resolution
+ # The resolver uses WANDB_ENTITY/WANDB_PROJECT from environment to locate artifacts
+ # This loads entity/project from env.toml [wandb] section if not already set
+ _ensure_wandb_host_env()
+
+ # Resolve artifacts (${art:model,path} etc.)
+ clear_artifact_cache()
+ register_resolvers_from_config(
+ job_config,
+ artifacts_key="run",
+ mode="pre_init",
+ )
+
+ # Resolve all interpolations
+ # This resolves: ${run.env.host}, ${run.wandb.entity}, ${art:model,path}, etc.
+ resolved_config = OmegaConf.to_container(job_config, resolve=True)
+
+ # Extract evaluator-specific config (everything except 'run' section)
+ # The 'run' section was only needed for interpolation, not for the launcher
+ eval_config = {k: v for k, v in resolved_config.items() if k != "run"}
+ eval_config = OmegaConf.create(eval_config)
+
+ # Parse -t/--task flags from passthrough
+ task_list = _parse_task_flags(passthrough)
+
+ # Validate that no extra passthrough args exist (only -t/--task allowed)
+ extra_args = _get_non_task_args(passthrough)
+ if extra_args:
+ typer.echo(
+ f"Error: Unknown arguments: {' '.join(extra_args)}\n"
+ "Only -t/--task flags are supported for passthrough.",
+ err=True,
+ )
+ raise typer.Exit(1)
+
+ # Import and call evaluator launcher
+ try:
+ from nemo_evaluator_launcher.api.functional import run_eval
+ except ImportError:
+ typer.echo("Error: nemo-evaluator-launcher is required for evaluation", err=True)
+ typer.echo('Install with: pip install "nemotron[evaluator]"', err=True)
+ raise typer.Exit(1)
+
+ # Inject W&B env var mappings into eval_config if needed
+ # (env vars were already set earlier for artifact resolution)
+ if _needs_wandb(eval_config):
+ _inject_wandb_env_mappings(eval_config)
+
+ # Call the launcher
+ console.print("\n[bold blue]Starting evaluation...[/bold blue]")
+ invocation_id = run_eval(eval_config, dry_run=False, tasks=task_list)
+
+ if invocation_id:
+ console.print(f"\n[green]✓[/green] Evaluation submitted: [cyan]{invocation_id}[/cyan]")
+ console.print(
+ f"[dim]Check status: nemo-evaluator-launcher status {invocation_id}[/dim]"
+ )
+ console.print(f"[dim]Stream logs: nemo-evaluator-launcher logs {invocation_id}[/dim]")
+
+
+def _parse_task_flags(passthrough: list[str]) -> list[str] | None:
+ """Parse -t/--task flags from passthrough args.
+
+ Args:
+ passthrough: List of passthrough arguments
+
+ Returns:
+ List of task names, or None if no tasks specified
+ """
+ tasks = []
+ i = 0
+ while i < len(passthrough):
+ if passthrough[i] in ("-t", "--task") and i + 1 < len(passthrough):
+ tasks.append(passthrough[i + 1])
+ i += 2
+ else:
+ i += 1
+ return tasks if tasks else None
+
+
+def _get_non_task_args(passthrough: list[str]) -> list[str]:
+ """Get passthrough args that are not -t/--task flags.
+
+ Args:
+ passthrough: List of passthrough arguments
+
+ Returns:
+ List of non-task arguments
+ """
+ extra = []
+ i = 0
+ while i < len(passthrough):
+ if passthrough[i] in ("-t", "--task") and i + 1 < len(passthrough):
+ i += 2 # Skip -t and its value
+ else:
+ extra.append(passthrough[i])
+ i += 1
+ return extra
+
+
+# =============================================================================
+# W&B Token Auto-Propagation
+# =============================================================================
+# Similar to how nemo-run automatically passes WANDB_API_KEY when logged in,
+# these helpers ensure the evaluator launcher receives the W&B credentials.
+
+
+def _needs_wandb(cfg: Any) -> bool:
+ """Check if config requires W&B credentials.
+
+ Returns True if:
+ - execution.auto_export.destinations contains "wandb", OR
+ - export.wandb section exists
+
+ Args:
+ cfg: Job configuration (OmegaConf DictConfig or dict)
+
+ Returns:
+ True if W&B credentials are needed
+ """
+ from omegaconf import OmegaConf
+
+ # Convert to dict for easier access
+ if hasattr(cfg, "_content"):
+ cfg_dict = OmegaConf.to_container(cfg, resolve=False)
+ else:
+ cfg_dict = cfg
+
+ # Check execution.auto_export.destinations
+ try:
+ destinations = cfg_dict.get("execution", {}).get("auto_export", {}).get("destinations", [])
+ if "wandb" in destinations:
+ return True
+ except (AttributeError, TypeError):
+ pass
+
+ # Check export.wandb section
+ try:
+ if cfg_dict.get("export", {}).get("wandb") is not None:
+ return True
+ except (AttributeError, TypeError):
+ pass
+
+ return False
+
+
+def _ensure_wandb_host_env() -> None:
+ """Ensure W&B environment variables are set on the host.
+
+ Auto-detects WANDB_API_KEY from local wandb login (same as nemo-run).
+ Also sets WANDB_PROJECT/WANDB_ENTITY from env.toml [wandb] section.
+
+ This is required because nemo-evaluator-launcher checks os.getenv()
+ for env_vars mappings at submission time.
+ """
+ # Auto-detect WANDB_API_KEY from wandb login
+ if "WANDB_API_KEY" not in os.environ:
+ try:
+ import wandb
+
+ api_key = wandb.api.api_key
+ if api_key:
+ os.environ["WANDB_API_KEY"] = api_key
+ sys.stderr.write("[info] Detected W&B login, setting WANDB_API_KEY\n")
+ except Exception:
+ pass # wandb not installed or not logged in
+
+ # Load WANDB_PROJECT/WANDB_ENTITY from env.toml [wandb] section
+ wandb_config = get_wandb_config()
+ if wandb_config is not None:
+ if wandb_config.get("project") and "WANDB_PROJECT" not in os.environ:
+ os.environ["WANDB_PROJECT"] = wandb_config.project
+ if wandb_config.get("entity") and "WANDB_ENTITY" not in os.environ:
+ os.environ["WANDB_ENTITY"] = wandb_config.entity
+
+
+def _inject_wandb_env_mappings(cfg: Any) -> None:
+ """Inject W&B env var mappings into evaluator config.
+
+ The nemo-evaluator-launcher expects:
+ - evaluation.env_vars: mapping of container env var -> host env var name
+ - execution.env_vars.export: env vars for the W&B export container
+
+ This function adds the WANDB_API_KEY (and optionally PROJECT/ENTITY)
+ mappings so the launcher knows to forward these from the host environment.
+
+ Note: This only adds string mappings (e.g., "WANDB_API_KEY": "WANDB_API_KEY"),
+ not actual secrets. The launcher resolves these via os.getenv() at runtime.
+
+ Args:
+ cfg: Job configuration (OmegaConf DictConfig) - modified in place
+ """
+ from omegaconf import open_dict
+
+ # Helper to safely set nested dict value
+ def _ensure_nested(cfg_node: Any, *keys: str) -> Any:
+ """Ensure nested dict path exists, creating dicts as needed."""
+ current = cfg_node
+ for key in keys:
+ if key not in current or current[key] is None:
+ with open_dict(current):
+ current[key] = {}
+ current = current[key]
+ return current
+
+ # Inject into evaluation.env_vars (for evaluation containers)
+ try:
+ eval_env = _ensure_nested(cfg, "evaluation", "env_vars")
+ with open_dict(eval_env):
+ if "WANDB_API_KEY" not in eval_env:
+ eval_env["WANDB_API_KEY"] = "WANDB_API_KEY"
+ if "WANDB_PROJECT" not in eval_env:
+ eval_env["WANDB_PROJECT"] = "WANDB_PROJECT"
+ if "WANDB_ENTITY" not in eval_env:
+ eval_env["WANDB_ENTITY"] = "WANDB_ENTITY"
+ except Exception:
+ pass # Config structure doesn't support this
+
+ # Inject into execution.env_vars.export (for W&B export container)
+ try:
+ export_env = _ensure_nested(cfg, "execution", "env_vars", "export")
+ with open_dict(export_env):
+ if "WANDB_API_KEY" not in export_env:
+ export_env["WANDB_API_KEY"] = "WANDB_API_KEY"
+ if "WANDB_PROJECT" not in export_env:
+ export_env["WANDB_PROJECT"] = "WANDB_PROJECT"
+ if "WANDB_ENTITY" not in export_env:
+ export_env["WANDB_ENTITY"] = "WANDB_ENTITY"
+ except Exception:
+ pass # Config structure doesn't support this
+
+
+# =============================================================================
+# Container Auto-Squash for Slurm
+# =============================================================================
+# Similar to how training recipes auto-squash Docker images for Slurm,
+# these helpers ensure evaluator container images are squashed before execution.
+
+
+def _collect_evaluator_images(cfg: Any) -> list[tuple[str, str]]:
+ """Collect (dotpath, image) for all container images in eval config.
+
+ Args:
+ cfg: Evaluator configuration (OmegaConf DictConfig)
+
+ Returns:
+ List of (dotpath, image_value) tuples for images that need squashing
+ """
+ from omegaconf import OmegaConf
+
+ images = []
+
+ # Deployment image
+ dep_image = OmegaConf.select(cfg, "deployment.image")
+ if dep_image and isinstance(dep_image, str):
+ images.append(("deployment.image", dep_image))
+
+ # Proxy image (if present)
+ proxy_image = OmegaConf.select(cfg, "execution.proxy.image")
+ if proxy_image and isinstance(proxy_image, str):
+ images.append(("execution.proxy.image", proxy_image))
+
+ return images
+
+
+def _maybe_auto_squash_evaluator(
+ job_config: Any,
+ global_ctx: GlobalContext,
+) -> None:
+ """Auto-squash container images for Slurm execution.
+
+ Checks if the executor is Slurm with SSH tunnel, and if so, squashes
+ any Docker images to .sqsh files on the remote cluster. Modifies
+ job_config in-place with the squashed paths.
+
+ Args:
+ job_config: Full job configuration (OmegaConf DictConfig) - modified in place
+ global_ctx: Global CLI context with mode and force_squash flag
+ """
+ from omegaconf import OmegaConf, open_dict
+
+ from nemotron.kit.cli.squash import ensure_squashed_image, is_sqsh_image
+
+ # Only for remote slurm execution
+ if global_ctx.mode not in ("run", "batch"):
+ return
+
+ # Skip on dry-run to avoid remote side effects
+ if global_ctx.dry_run:
+ return
+
+ # Get env config
+ env_config = OmegaConf.to_container(job_config.run.env, resolve=True)
+
+ # Only for Slurm executor
+ if env_config.get("executor") != "slurm":
+ return
+
+ # Need SSH tunnel support
+ if env_config.get("tunnel") != "ssh":
+ return
+
+ # Need SSH connection info
+ host = env_config.get("host")
+ user = env_config.get("user")
+ remote_job_dir = env_config.get("remote_job_dir")
+
+ if not all([host, remote_job_dir]):
+ return
+
+ # Check for nemo-run (optional dependency for SSH tunnel)
+ try:
+ import nemo_run as run
+ except ImportError:
+ console.print(
+ "[yellow]Warning:[/yellow] nemo-run not installed, skipping auto-squash. "
+ "Install with: pip install nemo-run"
+ )
+ return
+
+ # Collect images to squash
+ images = _collect_evaluator_images(job_config)
+ if not images:
+ return
+
+ # Filter out already-squashed images
+ images_to_squash = [(dp, img) for dp, img in images if not is_sqsh_image(img)]
+ if not images_to_squash:
+ return
+
+ # Create SSH tunnel
+ tunnel = run.SSHTunnel(
+ host=host,
+ user=user or "",
+ job_dir=remote_job_dir,
+ )
+
+ try:
+ tunnel.connect()
+
+ # Squash each image and update config
+ for dotpath, image in images_to_squash:
+ console.print(f"[blue]Auto-squashing:[/blue] {image}")
+ sqsh_path = ensure_squashed_image(
+ tunnel=tunnel,
+ container_image=image,
+ remote_job_dir=remote_job_dir,
+ env_config=env_config,
+ force=global_ctx.force_squash,
+ )
+
+ # Update config with squashed path
+ with open_dict(job_config):
+ OmegaConf.update(job_config, dotpath, sqsh_path, merge=False)
+
+ finally:
+ # Cleanup tunnel if it has a disconnect method
+ if hasattr(tunnel, "disconnect"):
+ try:
+ tunnel.disconnect()
+ except Exception:
+ pass
diff --git a/src/nemotron/kit/cli/recipe.py b/src/nemotron/kit/cli/recipe.py
index 48d2a7f6..bca98fe0 100644
--- a/src/nemotron/kit/cli/recipe.py
+++ b/src/nemotron/kit/cli/recipe.py
@@ -35,6 +35,7 @@
from nemotron.kit.cli.config import ConfigBuilder
from nemotron.kit.cli.display import display_job_config, display_job_submission
from nemotron.kit.cli.globals import GlobalContext, split_unknown_args
+from nemotron.kit.cli.squash import ensure_squashed_image, get_sqsh_path, is_sqsh_image
console = Console()
@@ -622,8 +623,12 @@ def _build_executor(
if container_image and tunnel and remote_job_dir:
# Connect tunnel to check/create squashed image
tunnel.connect()
- container_image = _ensure_squashed_image(
- tunnel, container_image, remote_job_dir, env_config, force=force_squash
+ container_image = ensure_squashed_image(
+ tunnel=tunnel,
+ container_image=container_image,
+ remote_job_dir=remote_job_dir,
+ env_config=env_config,
+ force=force_squash,
)
# Select partition based on mode (--run uses run_partition, --batch uses batch_partition)
@@ -787,118 +792,6 @@ def _build_packager(
)
-def _get_squash_path(container_image: str, remote_job_dir: str) -> str:
- """Get the path to the squashed container image.
-
- Creates a deterministic filename based on the container image name.
- For example: nvcr.io/nvidian/nemo:25.11-nano-v3.rc2 -> nemo-25.11-nano-v3.rc2.sqsh
-
- Args:
- container_image: Docker container image (e.g., nvcr.io/nvidian/nemo:25.11-nano-v3.rc2)
- remote_job_dir: Remote directory for squashed images
-
- Returns:
- Full path to squashed image file
- """
- # Extract image name and tag for readable filename
- # nvcr.io/nvidian/nemo:25.11-nano-v3.rc2 -> nemo:25.11-nano-v3.rc2
- image_name = container_image.split("/")[-1]
- # nemo:25.11-nano-v3.rc2 -> nemo-25.11-nano-v3.rc2.sqsh
- sqsh_name = image_name.replace(":", "-") + ".sqsh"
-
- return f"{remote_job_dir}/{sqsh_name}"
-
-
-def _ensure_squashed_image(
- tunnel: Any,
- container_image: str,
- remote_job_dir: str,
- env_config: dict,
- *,
- force: bool = False,
-) -> str:
- """Ensure the container image is squashed on the remote cluster.
-
- Checks if a squashed version exists, and if not, creates it using enroot
- on a compute node via salloc.
-
- Args:
- tunnel: SSHTunnel instance (already connected)
- container_image: Docker container image to squash
- remote_job_dir: Remote directory for squashed images
- env_config: Environment config with slurm settings (account, partition, time)
- force: If True, re-squash even if file already exists
-
- Returns:
- Path to the squashed image file
- """
- sqsh_path = _get_squash_path(container_image, remote_job_dir)
-
- # Check if squashed image already exists (unless force is set)
- if not force:
- with console.status("[bold blue]Checking for squashed image..."):
- result = tunnel.run(f"test -f {sqsh_path} && echo exists", hide=True, warn=True)
-
- if result.ok and "exists" in result.stdout:
- console.print(
- f"[green]✓[/green] Using existing squashed image: [cyan]{sqsh_path}[/cyan]"
- )
- return sqsh_path
-
- # Need to create the squashed image
- if force:
- console.print("[yellow]![/yellow] Force re-squash requested, removing existing file...")
- tunnel.run(f"rm -f {sqsh_path}", hide=True)
- else:
- console.print("[yellow]![/yellow] Squashed image not found, creating...")
- console.print(f" [dim]Image:[/dim] {container_image}")
- console.print(f" [dim]Output:[/dim] {sqsh_path}")
- console.print()
-
- # Ensure directory exists
- tunnel.run(f"mkdir -p {remote_job_dir}", hide=True)
-
- # Build salloc command to run enroot import on a compute node
- # (login nodes don't have enough memory for enroot import)
- account = env_config.get("account")
- partition = env_config.get("run_partition") or env_config.get("partition")
- time_limit = env_config.get("time", "04:00:00")
- gpus_per_node = env_config.get("gpus_per_node")
-
- salloc_args = []
- if account:
- salloc_args.append(f"--account={account}")
- if partition:
- salloc_args.append(f"--partition={partition}")
- salloc_args.append("--nodes=1")
- salloc_args.append("--ntasks-per-node=1")
- if gpus_per_node:
- salloc_args.append(f"--gpus-per-node={gpus_per_node}")
- salloc_args.append(f"--time={time_limit}")
-
- enroot_cmd = f"enroot import --output {sqsh_path} docker://{container_image}"
- cmd = f"salloc {' '.join(salloc_args)} srun {enroot_cmd}"
-
- # Run enroot import via salloc (this can take a while)
- console.print(
- "[bold blue]Allocating compute node and importing container "
- "(this may take several minutes)...[/bold blue]"
- )
- console.print(f"[dim]$ {cmd}[/dim]")
- console.print()
- result = tunnel.run(cmd, hide=False, warn=True)
-
- if not result.ok:
- raise RuntimeError(
- f"Failed to squash container image.\n"
- f"Command: {cmd}\n"
- f"Error: {result.stderr or 'Unknown error'}"
- )
-
- console.print(f"[green]✓[/green] Created squashed image: [cyan]{sqsh_path}[/cyan]")
- return sqsh_path
-
-
def _execute_stage_only(
script_path: str,
train_path: Path,
@@ -1068,7 +961,10 @@ def _print_stage_commands(
# Get squashed container path
sqsh_path = None
if container and remote_job_dir:
- sqsh_path = _get_squash_path(container, remote_job_dir)
+ if is_sqsh_image(container):
+ sqsh_path = container
+ else:
+ sqsh_path = get_sqsh_path(container, remote_job_dir)
# Mount to /workspace for simpler commands inside container
container_mount_path = "/workspace"
diff --git a/src/nemotron/kit/cli/squash.py b/src/nemotron/kit/cli/squash.py
index a915a319..337fb437 100644
--- a/src/nemotron/kit/cli/squash.py
+++ b/src/nemotron/kit/cli/squash.py
@@ -21,8 +21,13 @@
from __future__ import annotations
import re
+import shlex
from typing import Any
+from rich.console import Console
+
+console = Console()
+
def container_to_sqsh_name(container: str) -> str:
"""Convert container image name to deterministic squash filename.
@@ -50,6 +55,32 @@ def container_to_sqsh_name(container: str) -> str:
return f"{safe_name}.sqsh"
+def is_sqsh_image(image: str) -> bool:
+ """Check if image is already a .sqsh file.
+
+ Args:
+ image: Container image reference or path
+
+ Returns:
+ True if image is already a squash file
+ """
+ return image.endswith(".sqsh") or (image.startswith("/") and ".sqsh" in image)
+
+
+def get_sqsh_path(container: str, remote_job_dir: str) -> str:
+ """Compute deterministic .sqsh path for a container.
+
+ Args:
+ container: Docker container image (e.g., "nvcr.io/nvidian/nemo:25.11")
+ remote_job_dir: Remote directory for job files
+
+ Returns:
+ Full path to squashed image file under containers/
+ """
+ sqsh_name = container_to_sqsh_name(container)
+ return f"{remote_job_dir}/containers/{sqsh_name}"
+
+
def check_sqsh_exists(tunnel: Any, remote_path: str) -> bool:
"""Check if a squash file exists on the remote cluster.
@@ -62,3 +93,99 @@ def check_sqsh_exists(tunnel: Any, remote_path: str) -> bool:
"""
result = tunnel.run(f"test -f {remote_path} && echo exists", hide=True, warn=True)
return result.ok and "exists" in result.stdout
+
+
+def ensure_squashed_image(
+ *,
+ tunnel: Any,
+ container_image: str,
+ remote_job_dir: str,
+ env_config: dict[str, Any],
+ force: bool = False,
+) -> str:
+ """Ensure container is squashed on remote cluster, return .sqsh path.
+
+ Checks if a squashed version exists, and if not, creates it using enroot
+ on a compute node via salloc.
+
+ Args:
+ tunnel: SSHTunnel instance (will be connected if not already)
+ container_image: Docker container image to squash
+ remote_job_dir: Remote directory for squashed images
+ env_config: Environment config with slurm settings (account, partition, time)
+ force: If True, re-squash even if file already exists
+
+ Returns:
+ Path to the squashed image file
+
+ Raises:
+ RuntimeError: If squashing fails
+ """
+ sqsh_path = get_sqsh_path(container_image, remote_job_dir)
+
+ # Ensure remote directory exists
+ tunnel.run(f"mkdir -p {shlex.quote(remote_job_dir)}/containers", hide=True)
+
+ # Check if squashed image already exists (unless force is set)
+ if not force:
+ with console.status("[bold blue]Checking for squashed image..."):
+ if check_sqsh_exists(tunnel, sqsh_path):
+ console.print(
+ f"[green]✓[/green] Using existing squashed image: [cyan]{sqsh_path}[/cyan]"
+ )
+ return sqsh_path
+
+ # Need to create the squashed image
+ if force:
+ console.print("[yellow]![/yellow] Force re-squash requested, removing existing file...")
+ tunnel.run(f"rm -f {shlex.quote(sqsh_path)}", hide=True)
+ else:
+ console.print("[yellow]![/yellow] Squashed image not found, creating...")
+
+ console.print(f" [dim]Image:[/dim] {container_image}")
+ console.print(f" [dim]Output:[/dim] {sqsh_path}")
+ console.print()
+
+ # Build salloc command to run enroot import on a compute node
+ # (login nodes don't have enough memory for enroot import)
+ account = env_config.get("account")
+ partition = (
+ env_config.get("run_partition")
+ or env_config.get("batch_partition")
+ or env_config.get("partition")
+ )
+ time_limit = env_config.get("time", "04:00:00")
+ gpus_per_node = env_config.get("gpus_per_node")
+
+ salloc_args = []
+ if account:
+ salloc_args.append(f"--account={shlex.quote(account)}")
+ if partition:
+ salloc_args.append(f"--partition={shlex.quote(partition)}")
+ salloc_args.append("--nodes=1")
+ salloc_args.append("--ntasks-per-node=1")
+ if gpus_per_node:
+ salloc_args.append(f"--gpus-per-node={gpus_per_node}")
+ salloc_args.append(f"--time={time_limit}")
+
+ enroot_cmd = f"enroot import --output {shlex.quote(sqsh_path)} docker://{container_image}"
+ cmd = f"salloc {' '.join(salloc_args)} srun {enroot_cmd}"
+
+ # Run enroot import via salloc (this can take a while)
+ console.print(
+ "[bold blue]Allocating compute node and importing container "
+ "(this may take several minutes)...[/bold blue]"
+ )
+ console.print(f"[dim]$ {cmd}[/dim]")
+ console.print()
+ result = tunnel.run(cmd, hide=False, warn=True)
+
+ if not result.ok:
+ raise RuntimeError(
+ f"Failed to squash container image.\n"
+ f"Command: {cmd}\n"
+ f"Error: {result.stderr or 'Unknown error'}"
+ )
+
+ console.print(f"[green]✓[/green] Created squashed image: [cyan]{sqsh_path}[/cyan]")
+ return sqsh_path
diff --git a/src/nemotron/kit/cli/utils.py b/src/nemotron/kit/cli/utils.py
index 1b15d7ab..54be99bf 100644
--- a/src/nemotron/kit/cli/utils.py
+++ b/src/nemotron/kit/cli/utils.py
@@ -6,10 +6,36 @@
from pathlib import Path
from typing import Any
+# Pattern to match ${run.*} tokens (for embedded interpolations)
+RUN_TOKEN_RE = re.compile(r"\$\{run\.([^}]+)\}")
+
+
+def _lookup_run_path(run_data: dict, dotted_path: str) -> tuple[bool, Any]:
+ """Look up a dotted path in run_data.
+
+ Args:
+ run_data: The run section dictionary
+ dotted_path: Path like "env.remote_job_dir" or "wandb.project"
+
+ Returns:
+ Tuple of (found, value). If found is False, value is None.
+ """
+ keys = dotted_path.split(".")
+ current = run_data
+ for key in keys:
+ if not isinstance(current, dict) or key not in current:
+ return (False, None)
+ current = current[key]
+ return (True, current)
+
def resolve_run_interpolations(obj: Any, run_data: dict) -> Any:
"""Recursively resolve ${run.*} interpolations in a dict/list.
+ Handles both:
+ - Exact matches: "${run.foo}" -> preserves type of resolved value
+ - Embedded: "${run.foo}/bar" -> string substitution via regex
+
Only resolves ${run.X.Y} style interpolations, preserves other
interpolations like ${art:data,path}.
@@ -24,18 +50,29 @@ def resolve_run_interpolations(obj: Any, run_data: dict) -> Any:
return {k: resolve_run_interpolations(v, run_data) for k, v in obj.items()}
elif isinstance(obj, list):
return [resolve_run_interpolations(item, run_data) for item in obj]
- elif isinstance(obj, str) and obj.startswith("${run.") and obj.endswith("}"):
- # Extract the path: ${run.wandb.project} -> wandb.project
- path = obj[6:-1] # Remove "${run." and "}"
- # Navigate run_data to get the value
- parts = path.split(".")
- value = run_data
- for part in parts:
- if isinstance(value, dict) and part in value:
- value = value[part]
- else:
- return obj # Can't resolve, keep original
- return value
+ elif isinstance(obj, str):
+ # Check for exact match first (preserves type)
+ if obj.startswith("${run.") and obj.endswith("}") and obj.count("${") == 1:
+ # Extract the path: ${run.wandb.project} -> wandb.project
+ path = obj[6:-1] # Remove "${run." and "}"
+ found, value = _lookup_run_path(run_data, path)
+ if found:
+ return value
+ return obj # Can't resolve, keep original
+
+ # Check for embedded interpolations (string substitution)
+ if "${run." in obj:
+
+ def replace_token(match: re.Match) -> str:
+ dotted_path = match.group(1)
+ found, value = _lookup_run_path(run_data, dotted_path)
+ if found:
+ return str(value)
+ return match.group(0) # Keep original if not found
+
+ return RUN_TOKEN_RE.sub(replace_token, obj)
+
+ return obj
else:
return obj
diff --git a/src/nemotron/recipes/evaluator/config/nemotron-3-nano-nemo-ray.yaml b/src/nemotron/recipes/evaluator/config/nemotron-3-nano-nemo-ray.yaml
new file mode 100644
index 00000000..38d6f03c
--- /dev/null
+++ b/src/nemotron/recipes/evaluator/config/nemotron-3-nano-nemo-ray.yaml
@@ -0,0 +1,197 @@
+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Nemotron-3-Nano-30B Evaluation with NeMo Framework Ray Deployment
+#
+# This config evaluates the NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16 model
+# using NeMo Framework's Ray-based in-framework deployment.
+#
+# Usage:
+# nemotron evaluate -c nemotron-3-nano-nemo-ray --run MY-CLUSTER
+#
+# Override checkpoint:
+# nemotron evaluate -c nemotron-3-nano-nemo-ray --run MY-CLUSTER \
+# deployment.checkpoint_path=/path/to/checkpoint
+#
+# Filter tasks:
+# nemotron evaluate -c nemotron-3-nano-nemo-ray --run MY-CLUSTER -t adlr_mmlu
+
+# =============================================================================
+# Defaults - Use slurm executor and generic deployment
+# =============================================================================
+defaults:
+ - execution: slurm/default
+ - deployment: generic
+ - _self_
+
+# =============================================================================
+# Nemotron run section (env.toml injection)
+# This section is used for interpolation and stripped before calling the launcher
+# =============================================================================
+run:
+ # Environment config - populated from env.toml profile via --run
+ env:
+ # Default container for NeMo Framework Ray deployment (squash file for Slurm)
+ container: /lustre/fsw/portfolios/coreai/users/athittenaman/nvidia+nemo+25.11.nemotron_3_nano.sqsh
+ executor: slurm
+ host: ${oc.env:HOSTNAME,localhost}
+ user: ${oc.env:USER}
+ account: null
+ partition: batch
+ remote_job_dir: ${oc.env:PWD}/.nemotron
+ time: "04:00:00"
+
+ # W&B config - populated from env.toml [wandb] section
+ wandb:
+ entity: null
+ project: null
+
+# =============================================================================
+# Execution Configuration
+# =============================================================================
+execution:
+ type: slurm
+ hostname: ${run.env.host}
+ username: ${run.env.user}
+ account: ${run.env.account}
+ output_dir: ${run.env.remote_job_dir}/evaluations
+ walltime: ${run.env.time}
+ partition: ${run.env.partition}
+
+ # Slurm resource configuration
+ num_nodes: 1
+ ntasks_per_node: 1
+ gres: gpu:8
+ subproject: nemo-evaluator-launcher
+ sbatch_comment: null
+
+ deployment:
+ n_tasks: ${execution.num_nodes}
+
+ # HAProxy for load balancing across Ray workers
+ proxy:
+ type: haproxy
+ image: gitlab-master.nvidia.com/dl/joc/competitive_evaluation/nvidia-core-evals/haproxy-container/haproxy:2025-10-03T17-19-2679aefe0800
+ config:
+ haproxy_port: 5009
+ health_check_path: /v1/health
+ health_check_status: 200
+
+ # Auto-export results after evaluation completes
+ auto_export:
+ enabled: true
+ destinations:
+ - wandb
+
+ # Environment variables for deployment and evaluation containers
+ # NOTE: HF_TOKEN must be set in your environment if using HuggingFace gated models
+ # NOTE: WANDB_API_KEY is auto-detected from local wandb login (like nemo-run)
+ env_vars:
+ deployment:
+ HF_HOME: /cache/huggingface
+ NIM_CACHE_PATH: /cache/nim
+ VLLM_CACHE_ROOT: /cache/vllm
+ evaluation:
+ HF_HOME: /cache/huggingface
+ # W&B export env vars (auto-injected by CLI if logged in locally)
+ # These map host env var names -> container env var names
+ export:
+ WANDB_API_KEY: WANDB_API_KEY
+ WANDB_PROJECT: WANDB_PROJECT
+ WANDB_ENTITY: WANDB_ENTITY
+
+ # Mounts for deployment and evaluation containers
+ mounts:
+ deployment:
+ /lustre: /lustre
+ /lustre/fsw/portfolios/coreai/users/athittenaman/Export-Deploy: /opt/Export-Deploy
+ evaluation:
+ /lustre: /lustre
+ mount_home: false
+
+# =============================================================================
+# Deployment Configuration - NeMo Framework Ray
+# =============================================================================
+deployment:
+ type: generic
+ multiple_instances: true
+ image: ${run.env.container}
+ health_check_path: /v1/health
+ port: 1235 # Port used by Ray deployment
+ served_model_name: nemo-model
+ # Hardcoded checkpoint path - override via CLI: deployment.checkpoint_path=/your/path
+ checkpoint_path: /lustre/fsw/portfolios/coreai/users/athittenaman/checkpoints/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16/iter_0000000
+
+ # NeMo Framework Ray deployment command
+ # Parallelism settings for Nano3 (30B MoE model): TP=2, EP=8
+ command: >-
+ bash -c 'export TRITON_CACHE_DIR=/tmp/triton_cache_$$SLURM_NODEID;
+ python /opt/Export-Deploy/scripts/deploy/nlp/deploy_ray_inframework.py
+ --megatron_checkpoint /checkpoint/
+ --num_gpus 8
+ --tensor_model_parallel_size 2
+ --expert_model_parallel_size 8
+ --port 1235
+ --num_replicas 1'
+
+ # Health check endpoints
+ endpoints:
+ chat: /v1/chat/completions/
+ completions: /v1/completions/
+ health: /v1/health
+
+# =============================================================================
+# Evaluation Configuration
+# =============================================================================
+evaluation:
+ nemo_evaluator_config:
+ config:
+ params:
+ max_retries: 5
+ parallelism: 4
+ request_timeout: 6000
+ extra:
+ tokenizer: ${deployment.checkpoint_path}/tokenizer
+ tokenizer_backend: huggingface
+ target:
+ api_endpoint:
+ adapter_config:
+ output_dir: /results
+ use_progress_tracking: false
+ use_caching: true
+ caching_dir: /results/cache
+ use_response_logging: true
+ max_logged_responses: 10
+ use_request_logging: true
+ max_logged_requests: 10
+
+ # Tasks to run (can be filtered with -t flag)
+ tasks:
+ - name: adlr_mmlu
+ nemo_evaluator_config:
+ config:
+ params:
+ top_p: 0.0
+ - name: adlr_arc_challenge_llama_25_shot
+ - name: adlr_winogrande_5_shot
+ - name: hellaswag
+ - name: openbookqa
+
+# =============================================================================
+# Export Configuration - W&B
+# =============================================================================
+export:
+ wandb:
+ entity: ${run.wandb.entity}
+ project: ${run.wandb.project}
diff --git a/src/nemotron/recipes/nano3/README.md b/src/nemotron/recipes/nano3/README.md
index dae4c0a6..871a7611 100644
--- a/src/nemotron/recipes/nano3/README.md
+++ b/src/nemotron/recipes/nano3/README.md
@@ -71,6 +71,7 @@ flowchart TB
| [Stage 0: Pretrain](./stage0_pretrain/) | Train on large text corpus | Megatron-Bridge | Base model checkpoint |
| [Stage 1: SFT](./stage1_sft/) | Instruction tuning | Megatron-Bridge | Instruction-following model |
| [Stage 2: RL](./stage2_rl/) | Alignment with GRPO | NeMo-RL | Final aligned model |
+| [Stage 3: Eval](./stage3_eval/) | Model evaluation | NeMo-Evaluator | Benchmark results |
## Prerequisites
@@ -121,6 +122,9 @@ uv run nemotron nano3 sft --run YOUR-CLUSTER
# Stage 2: Data prep + RL
uv run nemotron nano3 data prep rl --run YOUR-CLUSTER
uv run nemotron nano3 rl --run YOUR-CLUSTER
+
+# Stage 3: Evaluation
+uv run nemotron nano3 eval --run YOUR-CLUSTER
```
### Testing with Tiny Config
@@ -163,6 +167,24 @@ uv run nemotron nano3 sft [--run ] [-c ] [overrides...]
uv run nemotron nano3 rl [--run ] [-c ] [overrides...]
```
+### Evaluation
+
+```bash
+# Evaluate the trained model (defaults to RL output: run.model=rl:latest)
+uv run nemotron nano3 eval [--run ] [-c ] [-t ...] [overrides...]
+
+# Evaluate a specific model artifact
+uv run nemotron nano3 eval --run YOUR-CLUSTER run.model=sft:v2
+
+# Filter specific tasks
+uv run nemotron nano3 eval --run YOUR-CLUSTER -t adlr_mmlu -t hellaswag
+
+# Dry run (preview resolved config)
+uv run nemotron nano3 eval --run YOUR-CLUSTER --dry-run
+```
+
+> **Note**: Evaluation requires the `nemo-evaluator-launcher` package. Install with: `pip install "nemotron[evaluator]"`
+
### Execution Options
| Option | Description |
@@ -261,6 +283,7 @@ torchrun --nproc_per_node=8 train.py --config config/tiny.yaml
- [Stage 0: Pretraining](./stage0_pretrain/README.md) - Pretrain on large text corpus
- [Stage 1: SFT](./stage1_sft/README.md) - Supervised fine-tuning for instruction following
- [Stage 2: RL](./stage2_rl/README.md) - Reinforcement learning for alignment
+- [Stage 3: Eval](./stage3_eval/README.md) - Model evaluation with NeMo-Evaluator
## Further Reading
diff --git a/src/nemotron/recipes/nano3/stage3_eval/README.md b/src/nemotron/recipes/nano3/stage3_eval/README.md
new file mode 100644
index 00000000..cbe54bc2
--- /dev/null
+++ b/src/nemotron/recipes/nano3/stage3_eval/README.md
@@ -0,0 +1,178 @@
+# Stage 3: Evaluation
+
+Evaluate trained models using NeMo-Evaluator, supporting multiple benchmark tasks and automatic results export.
+
+## Overview
+
+The evaluation stage integrates with `nemo-evaluator-launcher` to:
+- Deploy your trained model using vLLM
+- Run standardized benchmark tasks (MMLU, HellaSwag, etc.)
+- Export results to W&B for tracking
+
+## Prerequisites
+
+Install the evaluator dependency:
+
+```bash
+pip install "nemotron[evaluator]"
+```
+
+## Quick Start
+
+```bash
+# Evaluate the RL model (default)
+uv run nemotron nano3 eval --run YOUR-CLUSTER
+
+# Evaluate a specific model
+uv run nemotron nano3 eval --run YOUR-CLUSTER run.model=sft:v2
+
+# Run specific tasks only
+uv run nemotron nano3 eval --run YOUR-CLUSTER -t adlr_mmlu -t hellaswag
+
+# Preview config without running
+uv run nemotron nano3 eval --run YOUR-CLUSTER --dry-run
+```
+
+## Configuration
+
+### Default Config
+
+The default configuration (`config/default.yaml`) includes:
+
+- **Model**: `run.model=rl:latest` (last RL checkpoint)
+- **Deployment**: vLLM with TP=4 for Nano3 (30B MoE)
+- **Tasks**: MMLU, HellaSwag, ARC-Challenge
+
+### Config Structure
+
+```yaml
+# Nemotron artifact resolution
+run:
+ model: rl:latest # Model artifact to evaluate
+ env: {...} # Populated from env.toml
+ wandb: {...} # Populated from env.toml [wandb]
+
+# Evaluator launcher config
+execution:
+ type: slurm # Execution backend (local/slurm)
+ hostname: ... # From ${run.env.host}
+
+deployment:
+ type: vllm
+ checkpoint_path: ${art:model,path} # Resolved from artifact
+ tensor_parallel_size: 4
+
+evaluation:
+ tasks:
+ - name: adlr_mmlu
+ - name: hellaswag
+
+export:
+ wandb:
+ entity: ${run.wandb.entity}
+ project: ${run.wandb.project}
+```
+
+### Task Filtering
+
+Use `-t/--task` to run specific tasks:
+
+```bash
+# Single task
+uv run nemotron nano3 eval --run CLUSTER -t adlr_mmlu
+
+# Multiple tasks
+uv run nemotron nano3 eval --run CLUSTER -t adlr_mmlu -t hellaswag -t arc_challenge
+```
+
+## env.toml Integration
+
+Evaluation uses the same `env.toml` profile as training:
+
+```toml
+[YOUR-CLUSTER]
+executor = "slurm"
+host = "login.cluster.com"
+user = "myuser"
+account = "my-account"
+partition = "batch"
+remote_job_dir = "/lustre/jobs"
+time = "04:00:00"
+
+[wandb]
+entity = "my-org"
+project = "nano3-evals"
+```
+
+The env.toml fields map to evaluator config:
+- `host` → `execution.hostname`
+- `user` → `execution.username`
+- `account` → `execution.account`
+- `partition` → `execution.partition`
+- `remote_job_dir` → `execution.output_dir` base
+- `time` → `execution.walltime`
+- `[wandb]` → `export.wandb.*`
+
+## Artifacts
+
+### Input Artifact
+
+By default, evaluates the RL stage output. Override with:
+
+```bash
+# Evaluate SFT checkpoint
+uv run nemotron nano3 eval --run CLUSTER run.model=sft:latest
+
+# Evaluate specific version
+uv run nemotron nano3 eval --run CLUSTER run.model=sft:v2
+```
+
+### Output
+
+Results are exported to W&B as specified in `export.wandb`. Check status:
+
+```bash
+nemo-evaluator-launcher status
+nemo-evaluator-launcher logs
+```
+
+## Local Execution
+
+For local testing without Slurm:
+
+```bash
+# Set execution type to local
+uv run nemotron nano3 eval execution.type=local
+```
+
+## Generic Evaluate Command
+
+For custom evaluation configs not tied to nano3:
+
+```bash
+# Requires explicit config path
+uv run nemotron evaluate -c /path/to/eval.yaml --run YOUR-CLUSTER
+```
+
+## Troubleshooting
+
+### Missing Evaluator Package
+
+```
+Error: nemo-evaluator-launcher is required for evaluation
+Install with: pip install "nemotron[evaluator]"
+```
+
+### Task Not Found
+
+```
+Error: Requested task(s) not found in config: ['missing_task']
+Available tasks: ['adlr_mmlu', 'hellaswag', 'arc_challenge']
+```
+
+Check available tasks in your config or use `nemo-evaluator-launcher tasks` to list all available tasks.
+
+## Further Reading
+
+- [NeMo-Evaluator Documentation](https://github.com/NVIDIA-NeMo/Evaluator)
+- [env.toml Configuration](../../../../docs/train/nemo-run.md)
diff --git a/src/nemotron/recipes/nano3/stage3_eval/config/default.yaml b/src/nemotron/recipes/nano3/stage3_eval/config/default.yaml
new file mode 100644
index 00000000..99a47937
--- /dev/null
+++ b/src/nemotron/recipes/nano3/stage3_eval/config/default.yaml
@@ -0,0 +1,120 @@
+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Stage 3: Evaluation Configuration for Nemotron Nano3
+#
+# This config integrates with nemo-evaluator-launcher. The 'run' section is
+# used for Nemotron's artifact resolution and env.toml profile injection,
+# then stripped before passing to the evaluator launcher.
+#
+# Usage:
+# nemotron nano3 eval --run MY-CLUSTER
+# nemotron nano3 eval --run MY-CLUSTER run.model=sft:v2
+# nemotron nano3 eval --run MY-CLUSTER -t adlr_mmlu -t hellaswag
+# nemotron nano3 eval --dry-run
+
+# =============================================================================
+# Defaults - Use local executor and vLLM deployment
+# For slurm, override with: execution.type=slurm
+# =============================================================================
+defaults:
+ - execution: local
+ - deployment: vllm
+ - _self_
+
+# =============================================================================
+# Nemotron run section (artifact resolution + env.toml injection)
+# This section is used for interpolation and stripped before calling the launcher
+# =============================================================================
+run:
+ # Model artifact to evaluate (default: RL stage output)
+ model: rl:latest
+
+ # Environment config - populated from env.toml profile via --run
+ # These defaults allow local execution without env.toml
+ env:
+ executor: local
+ container: nvcr.io/nvidia/nemo-evaluator:latest
+ host: ${oc.env:HOSTNAME,localhost}
+ user: ${oc.env:USER}
+ account: null
+ partition: null
+ remote_job_dir: ${oc.env:PWD}/.nemotron
+ time: "04:00:00"
+
+ # W&B config - populated from env.toml [wandb] section
+ wandb:
+ entity: null
+ project: null
+
+# =============================================================================
+# NeMo-Evaluator Launcher Configuration
+# Everything below is passed directly to nemo-evaluator-launcher
+# =============================================================================
+
+# Execution configuration
+# Maps env.toml profile fields to evaluator's execution section
+execution:
+ type: ${run.env.executor}
+ hostname: ${run.env.host}
+ username: ${run.env.user}
+ account: ${run.env.account}
+ partition: ${run.env.partition}
+ output_dir: ${run.env.remote_job_dir}/evaluations
+ walltime: ${run.env.time}
+ # Auto-export results to W&B after evaluation completes
+ auto_export:
+ enabled: true
+ destinations:
+ - wandb
+
+# Deployment configuration
+# Specifies how to serve the model for evaluation
+deployment:
+ type: vllm
+ image: ${run.env.container}
+ checkpoint_path: ${art:model,path}
+ # Parallelism settings for Nano3 (30B MoE model)
+ tensor_parallel_size: 4
+ data_parallel_size: 1
+ extra_args: "--max-model-len 32768"
+
+# Evaluation configuration
+# Defines tasks and evaluation parameters
+evaluation:
+ # Environment variables for evaluation tasks
+ env_vars:
+ HF_TOKEN: HF_TOKEN
+
+ # Global config settings that apply to all tasks
+ nemo_evaluator_config:
+ config:
+ params:
+ request_timeout: 3600
+ parallelism: 8
+ # Uncomment for quick testing:
+ # limit_samples: 10
+
+ # Tasks to run (can be filtered with -t flag)
+ tasks:
+ - name: adlr_mmlu
+ - name: hellaswag
+ - name: arc_challenge
+
+# Export configuration
+# W&B export for results logging - populated from env.toml [wandb] section
+export:
+ wandb:
+ entity: ${run.wandb.entity}
+ project: ${run.wandb.project}
diff --git a/uv.lock b/uv.lock
index edb4618d..88704e28 100644
--- a/uv.lock
+++ b/uv.lock
@@ -252,6 +252,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/a1/ee/48ca1a7c89ffec8b6a0c5d02b89c305671d5ffd8d3c94acf8b8c408575bb/anyio-4.9.0-py3-none-any.whl", hash = "sha256:9f76d541cad6e36af7beb62e978876f3b41e3e04f2c1fbf0884604c0a9c4d93c", size = 100916, upload-time = "2025-03-17T00:02:52.713Z" },
]
+[[package]]
+name = "argcomplete"
+version = "3.6.3"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/38/61/0b9ae6399dd4a58d8c1b1dc5a27d6f2808023d0b5dd3104bb99f45a33ff6/argcomplete-3.6.3.tar.gz", hash = "sha256:62e8ed4fd6a45864acc8235409461b72c9a28ee785a2011cc5eb78318786c89c", size = 73754, upload-time = "2025-10-20T03:33:34.741Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/74/f5/9373290775639cb67a2fce7f629a1c240dce9f12fe927bc32b2736e16dfc/argcomplete-3.6.3-py3-none-any.whl", hash = "sha256:f5007b3a600ccac5d25bbce33089211dfd49eab4a7718da3f10e3082525a92ce", size = 43846, upload-time = "2025-10-20T03:33:33.021Z" },
+]
+
[[package]]
name = "astroid"
version = "3.3.11"
@@ -374,6 +383,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/1a/39/47f9197bdd44df24d67ac8893641e16f386c984a0619ef2ee4c51fbbc019/beautifulsoup4-4.14.3-py3-none-any.whl", hash = "sha256:0918bfe44902e6ad8d57732ba310582e98da931428d231a5ecb9e7c703a735bb", size = 107721, upload-time = "2025-11-30T15:08:24.087Z" },
]
+[[package]]
+name = "blinker"
+version = "1.9.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/21/28/9b3f50ce0e048515135495f198351908d99540d69bfdc8c1d15b73dc55ce/blinker-1.9.0.tar.gz", hash = "sha256:b4ce2265a7abece45e7cc896e98dbebe6cead56bcf805a3d23136d145f5445bf", size = 22460, upload-time = "2024-11-08T17:25:47.436Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/10/cb/f2ad4230dc2eb1a74edf38f1a38b9b52277f75bef262d8908e60d957e13c/blinker-1.9.0-py3-none-any.whl", hash = "sha256:ba0efaa9080b619ff2f3459d1d500c57bddea4a6b424b60a91141db6fd2f08bc", size = 8458, upload-time = "2024-11-08T17:25:46.184Z" },
+]
+
[[package]]
name = "botocore"
version = "1.41.5"
@@ -921,6 +939,23 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/76/91/7216b27286936c16f5b4d0c530087e4a54eead683e6b0b73dd0c64844af6/filelock-3.20.0-py3-none-any.whl", hash = "sha256:339b4732ffda5cd79b13f4e2711a31b0365ce445d95d243bb996273d072546a2", size = 16054, upload-time = "2025-10-08T18:03:48.35Z" },
]
+[[package]]
+name = "flask"
+version = "3.1.2"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "blinker" },
+ { name = "click" },
+ { name = "itsdangerous" },
+ { name = "jinja2" },
+ { name = "markupsafe" },
+ { name = "werkzeug" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/dc/6d/cfe3c0fcc5e477df242b98bfe186a4c34357b4847e87ecaef04507332dab/flask-3.1.2.tar.gz", hash = "sha256:bf656c15c80190ed628ad08cdfd3aaa35beb087855e2f494910aa3774cc4fd87", size = 720160, upload-time = "2025-08-19T21:03:21.205Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/ec/f9/7f9263c5695f4bd0023734af91bedb2ff8209e8de6ead162f35d8dc762fd/flask-3.1.2-py3-none-any.whl", hash = "sha256:ca1d8112ec8a6158cc29ea4858963350011b5c846a414cdb7a954aa9e967d03c", size = 103308, upload-time = "2025-08-19T21:03:19.499Z" },
+]
+
[[package]]
name = "frozenlist"
version = "1.8.0"
@@ -1352,6 +1387,20 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/cb/bd/1a875e0d592d447cbc02805fd3fe0f497714d6a2583f59d14fa9ebad96eb/huggingface_hub-0.36.0-py3-none-any.whl", hash = "sha256:7bcc9ad17d5b3f07b57c78e79d527102d08313caa278a641993acddcb894548d", size = 566094, upload-time = "2025-10-23T12:11:59.557Z" },
]
+[[package]]
+name = "hydra-core"
+version = "1.3.2"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "antlr4-python3-runtime" },
+ { name = "omegaconf" },
+ { name = "packaging" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/6d/8e/07e42bc434a847154083b315779b0a81d567154504624e181caf2c71cd98/hydra-core-1.3.2.tar.gz", hash = "sha256:8a878ed67216997c3e9d88a8e72e7b4767e81af37afb4ea3334b269a4390a824", size = 3263494, upload-time = "2023-02-23T18:33:43.03Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/c6/50/e0edd38dcd63fb26a8547f13d28f7a008bc4a3fd4eb4ff030673f22ad41a/hydra_core-1.3.2-py3-none-any.whl", hash = "sha256:fa0238a9e31df3373b35b0bfb672c34cc92718d21f81311d8996a16de1141d8b", size = 154547, upload-time = "2023-02-23T18:33:40.801Z" },
+]
+
[[package]]
name = "hyperframe"
version = "6.1.0"
@@ -1422,6 +1471,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/32/4b/b99e37f88336009971405cbb7630610322ed6fbfa31e1d7ab3fbf3049a2d/invoke-2.2.1-py3-none-any.whl", hash = "sha256:2413bc441b376e5cd3f55bb5d364f973ad8bdd7bf87e53c79de3c11bf3feecc8", size = 160287, upload-time = "2025-10-11T00:36:33.703Z" },
]
+[[package]]
+name = "itsdangerous"
+version = "2.2.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/9c/cb/8ac0172223afbccb63986cc25049b154ecfb5e85932587206f42317be31d/itsdangerous-2.2.0.tar.gz", hash = "sha256:e0050c0b7da1eea53ffaf149c0cfbb5c6e2e2b69c4bef22c81fa6eb73e5f6173", size = 54410, upload-time = "2024-04-16T21:28:15.614Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/04/96/92447566d16df59b2a776c0fb82dbc4d9e07cd95062562af01e408583fc4/itsdangerous-2.2.0-py3-none-any.whl", hash = "sha256:c6242fc49e35958c8b15141343aa660db5fc54d4f13a1db01a3f5891b98700ef", size = 16234, upload-time = "2024-04-16T21:28:14.499Z" },
+]
+
[[package]]
name = "jinja2"
version = "3.1.6"
@@ -2093,6 +2151,49 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/5f/df/76d0321c3797b54b60fef9ec3bd6f4cfd124b9e422182156a1dd418722cf/myst_parser-4.0.1-py3-none-any.whl", hash = "sha256:9134e88959ec3b5780aedf8a99680ea242869d012e8821db3126d427edc9c95d", size = 84579, upload-time = "2025-02-12T10:53:02.078Z" },
]
+[[package]]
+name = "nemo-evaluator"
+version = "0.1.69"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "flask" },
+ { name = "jinja2" },
+ { name = "psutil" },
+ { name = "pydantic" },
+ { name = "pydantic-core" },
+ { name = "pyyaml" },
+ { name = "requests" },
+ { name = "structlog" },
+ { name = "typing-extensions" },
+ { name = "werkzeug" },
+ { name = "yq" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/2e/aa/2d0dc08fbe404987159c292497d055639d236408ba82472c93aa34cc9be8/nemo_evaluator-0.1.69.tar.gz", hash = "sha256:8b386fb4a0882661d4863bdb96487e40cb7c89c90a04b09f0db9bf6115e29681", size = 107078, upload-time = "2026-01-22T01:35:48.067Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/5a/42/c9fe6aded43f6174e1bd72805f2621d580e0b039f754cb76ffa7800cc715/nemo_evaluator-0.1.69-py3-none-any.whl", hash = "sha256:329ac3aadf22028515aab03b0cd902c95b7d547d90e6a58620295e98c5ab7a16", size = 141078, upload-time = "2026-01-22T01:35:46.971Z" },
+]
+
+[[package]]
+name = "nemo-evaluator-launcher"
+version = "0.1.71"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "hydra-core" },
+ { name = "jinja2" },
+ { name = "leptonai" },
+ { name = "nemo-evaluator" },
+ { name = "pyyaml" },
+ { name = "requests" },
+ { name = "simple-parsing" },
+ { name = "structlog" },
+ { name = "tabulate" },
+ { name = "tomli", marker = "python_full_version < '3.11'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/1e/30/a2c5ccf527a2c0d70531ffab1c1875006cdd07823d4f8fbdc460e72f92aa/nemo_evaluator_launcher-0.1.71.tar.gz", hash = "sha256:5ccbca7b315278dc1aa87c5a5012213439884e3308c2ccd26ed10ad2a0edd007", size = 176684, upload-time = "2026-01-22T01:35:56.947Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/c6/7f/27f7d7937cd599b3b4b3bad0d0e1e41bad573aae5dc1011fee85b3e9e6bc/nemo_evaluator_launcher-0.1.71-py3-none-any.whl", hash = "sha256:80a78d896f2972cfaba999eea844961d44452d5225fb45d8360bffbccc25ab3f", size = 221172, upload-time = "2026-01-22T01:35:55.674Z" },
+]
+
[[package]]
name = "nemo-run"
version = "0.7.0"
@@ -2150,6 +2251,7 @@ dependencies = [
[package.optional-dependencies]
all = [
{ name = "gcsfs" },
+ { name = "nemo-evaluator-launcher" },
{ name = "s3fs" },
{ name = "sentencepiece" },
{ name = "wandb" },
@@ -2160,6 +2262,9 @@ dev = [
{ name = "pytest-cov" },
{ name = "ruff" },
]
+evaluator = [
+ { name = "nemo-evaluator-launcher" },
+]
gcs = [
{ name = "gcsfs" },
]
@@ -2203,6 +2308,8 @@ requires-dist = [
{ name = "huggingface-hub", specifier = ">=0.20.0" },
{ name = "jinja2", specifier = ">=3.0.0" },
{ name = "mypy", marker = "extra == 'dev'", specifier = ">=1.0.0" },
+ { name = "nemo-evaluator-launcher", marker = "extra == 'all'", specifier = ">=0.1.0" },
+ { name = "nemo-evaluator-launcher", marker = "extra == 'evaluator'", specifier = ">=0.1.0" },
{ name = "nemo-run", specifier = ">=0.4.0" },
{ name = "numpy", specifier = ">=1.24.0" },
{ name = "omegaconf", specifier = ">=2.3.0" },
@@ -2228,7 +2335,7 @@ requires-dist = [
{ name = "wandb", marker = "extra == 'wandb'", specifier = ">=0.15.0" },
{ name = "xxhash", specifier = ">=3.4.0" },
]
-provides-extras = ["wandb", "s3", "gcs", "sentencepiece", "dev", "all"]
+provides-extras = ["wandb", "s3", "gcs", "sentencepiece", "evaluator", "dev", "all"]
[package.metadata.requires-dev]
dev = [{ name = "pytest", specifier = ">=9.0.2" }]
@@ -2863,6 +2970,34 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/0e/15/4f02896cc3df04fc465010a4c6a0cd89810f54617a32a70ef531ed75d61c/protobuf-6.33.2-py3-none-any.whl", hash = "sha256:7636aad9bb01768870266de5dc009de2d1b936771b38a793f73cbbf279c91c5c", size = 170501, upload-time = "2025-12-06T00:17:52.211Z" },
]
+[[package]]
+name = "psutil"
+version = "7.2.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/73/cb/09e5184fb5fc0358d110fc3ca7f6b1d033800734d34cac10f4136cfac10e/psutil-7.2.1.tar.gz", hash = "sha256:f7583aec590485b43ca601dd9cea0dcd65bd7bb21d30ef4ddbf4ea6b5ed1bdd3", size = 490253, upload-time = "2025-12-29T08:26:00.169Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/77/8e/f0c242053a368c2aa89584ecd1b054a18683f13d6e5a318fc9ec36582c94/psutil-7.2.1-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:ba9f33bb525b14c3ea563b2fd521a84d2fa214ec59e3e6a2858f78d0844dd60d", size = 129624, upload-time = "2025-12-29T08:26:04.255Z" },
+ { url = "https://files.pythonhosted.org/packages/26/97/a58a4968f8990617decee234258a2b4fc7cd9e35668387646c1963e69f26/psutil-7.2.1-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:81442dac7abfc2f4f4385ea9e12ddf5a796721c0f6133260687fec5c3780fa49", size = 130132, upload-time = "2025-12-29T08:26:06.228Z" },
+ { url = "https://files.pythonhosted.org/packages/db/6d/ed44901e830739af5f72a85fa7ec5ff1edea7f81bfbf4875e409007149bd/psutil-7.2.1-cp313-cp313t-manylinux2010_x86_64.manylinux_2_12_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:ea46c0d060491051d39f0d2cff4f98d5c72b288289f57a21556cc7d504db37fc", size = 180612, upload-time = "2025-12-29T08:26:08.276Z" },
+ { url = "https://files.pythonhosted.org/packages/c7/65/b628f8459bca4efbfae50d4bf3feaab803de9a160b9d5f3bd9295a33f0c2/psutil-7.2.1-cp313-cp313t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:35630d5af80d5d0d49cfc4d64c1c13838baf6717a13effb35869a5919b854cdf", size = 183201, upload-time = "2025-12-29T08:26:10.622Z" },
+ { url = "https://files.pythonhosted.org/packages/fb/23/851cadc9764edcc18f0effe7d0bf69f727d4cf2442deb4a9f78d4e4f30f2/psutil-7.2.1-cp313-cp313t-win_amd64.whl", hash = "sha256:923f8653416604e356073e6e0bccbe7c09990acef442def2f5640dd0faa9689f", size = 139081, upload-time = "2025-12-29T08:26:12.483Z" },
+ { url = "https://files.pythonhosted.org/packages/59/82/d63e8494ec5758029f31c6cb06d7d161175d8281e91d011a4a441c8a43b5/psutil-7.2.1-cp313-cp313t-win_arm64.whl", hash = "sha256:cfbe6b40ca48019a51827f20d830887b3107a74a79b01ceb8cc8de4ccb17b672", size = 134767, upload-time = "2025-12-29T08:26:14.528Z" },
+ { url = "https://files.pythonhosted.org/packages/05/c2/5fb764bd61e40e1fe756a44bd4c21827228394c17414ade348e28f83cd79/psutil-7.2.1-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:494c513ccc53225ae23eec7fe6e1482f1b8a44674241b54561f755a898650679", size = 129716, upload-time = "2025-12-29T08:26:16.017Z" },
+ { url = "https://files.pythonhosted.org/packages/c9/d2/935039c20e06f615d9ca6ca0ab756cf8408a19d298ffaa08666bc18dc805/psutil-7.2.1-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:3fce5f92c22b00cdefd1645aa58ab4877a01679e901555067b1bd77039aa589f", size = 130133, upload-time = "2025-12-29T08:26:18.009Z" },
+ { url = "https://files.pythonhosted.org/packages/77/69/19f1eb0e01d24c2b3eacbc2f78d3b5add8a89bf0bb69465bc8d563cc33de/psutil-7.2.1-cp314-cp314t-manylinux2010_x86_64.manylinux_2_12_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:93f3f7b0bb07711b49626e7940d6fe52aa9940ad86e8f7e74842e73189712129", size = 181518, upload-time = "2025-12-29T08:26:20.241Z" },
+ { url = "https://files.pythonhosted.org/packages/e1/6d/7e18b1b4fa13ad370787626c95887b027656ad4829c156bb6569d02f3262/psutil-7.2.1-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:d34d2ca888208eea2b5c68186841336a7f5e0b990edec929be909353a202768a", size = 184348, upload-time = "2025-12-29T08:26:22.215Z" },
+ { url = "https://files.pythonhosted.org/packages/98/60/1672114392dd879586d60dd97896325df47d9a130ac7401318005aab28ec/psutil-7.2.1-cp314-cp314t-win_amd64.whl", hash = "sha256:2ceae842a78d1603753561132d5ad1b2f8a7979cb0c283f5b52fb4e6e14b1a79", size = 140400, upload-time = "2025-12-29T08:26:23.993Z" },
+ { url = "https://files.pythonhosted.org/packages/fb/7b/d0e9d4513c46e46897b46bcfc410d51fc65735837ea57a25170f298326e6/psutil-7.2.1-cp314-cp314t-win_arm64.whl", hash = "sha256:08a2f175e48a898c8eb8eace45ce01777f4785bc744c90aa2cc7f2fa5462a266", size = 135430, upload-time = "2025-12-29T08:26:25.999Z" },
+ { url = "https://files.pythonhosted.org/packages/c5/cf/5180eb8c8bdf6a503c6919f1da28328bd1e6b3b1b5b9d5b01ae64f019616/psutil-7.2.1-cp36-abi3-macosx_10_9_x86_64.whl", hash = "sha256:b2e953fcfaedcfbc952b44744f22d16575d3aa78eb4f51ae74165b4e96e55f42", size = 128137, upload-time = "2025-12-29T08:26:27.759Z" },
+ { url = "https://files.pythonhosted.org/packages/c5/2c/78e4a789306a92ade5000da4f5de3255202c534acdadc3aac7b5458fadef/psutil-7.2.1-cp36-abi3-macosx_11_0_arm64.whl", hash = "sha256:05cc68dbb8c174828624062e73078e7e35406f4ca2d0866c272c2410d8ef06d1", size = 128947, upload-time = "2025-12-29T08:26:29.548Z" },
+ { url = "https://files.pythonhosted.org/packages/29/f8/40e01c350ad9a2b3cb4e6adbcc8a83b17ee50dd5792102b6142385937db5/psutil-7.2.1-cp36-abi3-manylinux2010_x86_64.manylinux_2_12_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:5e38404ca2bb30ed7267a46c02f06ff842e92da3bb8c5bfdadbd35a5722314d8", size = 154694, upload-time = "2025-12-29T08:26:32.147Z" },
+ { url = "https://files.pythonhosted.org/packages/06/e4/b751cdf839c011a9714a783f120e6a86b7494eb70044d7d81a25a5cd295f/psutil-7.2.1-cp36-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:ab2b98c9fc19f13f59628d94df5cc4cc4844bc572467d113a8b517d634e362c6", size = 156136, upload-time = "2025-12-29T08:26:34.079Z" },
+ { url = "https://files.pythonhosted.org/packages/44/ad/bbf6595a8134ee1e94a4487af3f132cef7fce43aef4a93b49912a48c3af7/psutil-7.2.1-cp36-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:f78baafb38436d5a128f837fab2d92c276dfb48af01a240b861ae02b2413ada8", size = 148108, upload-time = "2025-12-29T08:26:36.225Z" },
+ { url = "https://files.pythonhosted.org/packages/1c/15/dd6fd869753ce82ff64dcbc18356093471a5a5adf4f77ed1f805d473d859/psutil-7.2.1-cp36-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:99a4cd17a5fdd1f3d014396502daa70b5ec21bf4ffe38393e152f8e449757d67", size = 147402, upload-time = "2025-12-29T08:26:39.21Z" },
+ { url = "https://files.pythonhosted.org/packages/34/68/d9317542e3f2b180c4306e3f45d3c922d7e86d8ce39f941bb9e2e9d8599e/psutil-7.2.1-cp37-abi3-win_amd64.whl", hash = "sha256:b1b0671619343aa71c20ff9767eced0483e4fc9e1f489d50923738caf6a03c17", size = 136938, upload-time = "2025-12-29T08:26:41.036Z" },
+ { url = "https://files.pythonhosted.org/packages/3e/73/2ce007f4198c80fcf2cb24c169884f833fe93fbc03d55d302627b094ee91/psutil-7.2.1-cp37-abi3-win_arm64.whl", hash = "sha256:0d67c1822c355aa6f7314d92018fb4268a76668a536f133599b91edd48759442", size = 133836, upload-time = "2025-12-29T08:26:43.086Z" },
+]
+
[[package]]
name = "pyarrow"
version = "22.0.0"
@@ -3838,6 +3973,19 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/e0/f9/0595336914c5619e5f28a1fb793285925a8cd4b432c9da0a987836c7f822/shellingham-1.5.4-py2.py3-none-any.whl", hash = "sha256:7ecfff8f2fd72616f7481040475a65b2bf8af90a56c89140852d1120324e8686", size = 9755, upload-time = "2023-10-24T04:13:38.866Z" },
]
+[[package]]
+name = "simple-parsing"
+version = "0.1.8"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "docstring-parser" },
+ { name = "typing-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/be/67/e3e5b89f1c81ca574a157104b0ecebfc3096933cbf58f644c9cb0a56c94f/simple_parsing-0.1.8.tar.gz", hash = "sha256:19c2a9002ebd7ad281fce579f9b2a0aa0c4d67e1688cee0e8cdf6d8e98ec2c18", size = 255933, upload-time = "2026-01-20T23:29:05.258Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/70/46/eab9fe2a4a2f6665a7c79b2007121a00ba95502fef50c1537d8147b4f91c/simple_parsing-0.1.8-py3-none-any.whl", hash = "sha256:4d1ef136a28674b3ebb9760cacda4d6f01de32de0b280a869df977d182f12947", size = 113438, upload-time = "2026-01-20T23:29:04.17Z" },
+]
+
[[package]]
name = "six"
version = "1.17.0"
@@ -4122,6 +4270,18 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/d9/52/1064f510b141bd54025f9b55105e26d1fa970b9be67ad766380a3c9b74b0/starlette-0.50.0-py3-none-any.whl", hash = "sha256:9e5391843ec9b6e472eed1365a78c8098cfceb7a74bfd4d6b1c0c0095efb3bca", size = 74033, upload-time = "2025-11-01T15:25:25.461Z" },
]
+[[package]]
+name = "structlog"
+version = "25.5.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "typing-extensions", marker = "python_full_version < '3.11'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/ef/52/9ba0f43b686e7f3ddfeaa78ac3af750292662284b3661e91ad5494f21dbc/structlog-25.5.0.tar.gz", hash = "sha256:098522a3bebed9153d4570c6d0288abf80a031dfdb2048d59a49e9dc2190fc98", size = 1460830, upload-time = "2025-10-27T08:28:23.028Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/a8/45/a132b9074aa18e799b891b91ad72133c98d8042c70f6240e4c5f9dabee2f/structlog-25.5.0-py3-none-any.whl", hash = "sha256:a8453e9b9e636ec59bd9e79bbd4a72f025981b3ba0f5837aebf48f02f37a7f9f", size = 72510, upload-time = "2025-10-27T08:28:21.535Z" },
+]
+
[[package]]
name = "tabulate"
version = "0.9.0"
@@ -4584,6 +4744,18 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/fa/a8/5b41e0da817d64113292ab1f8247140aac61cbf6cfd085d6a0fa77f4984f/websockets-15.0.1-py3-none-any.whl", hash = "sha256:f7a866fbc1e97b5c617ee4116daaa09b722101d4a3c170c787450ba409f9736f", size = 169743, upload-time = "2025-03-05T20:03:39.41Z" },
]
+[[package]]
+name = "werkzeug"
+version = "3.1.5"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "markupsafe" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/5a/70/1469ef1d3542ae7c2c7b72bd5e3a4e6ee69d7978fa8a3af05a38eca5becf/werkzeug-3.1.5.tar.gz", hash = "sha256:6a548b0e88955dd07ccb25539d7d0cc97417ee9e179677d22c7041c8f078ce67", size = 864754, upload-time = "2026-01-08T17:49:23.247Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/ad/e4/8d97cca767bcc1be76d16fb76951608305561c6e056811587f36cb1316a8/werkzeug-3.1.5-py3-none-any.whl", hash = "sha256:5111e36e91086ece91f93268bb39b4a35c1e6f1feac762c9c822ded0a4e322dc", size = 225025, upload-time = "2026-01-08T17:49:21.859Z" },
+]
+
[[package]]
name = "win32-setctime"
version = "1.2.0"
@@ -4662,6 +4834,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/1f/f6/a933bd70f98e9cf3e08167fc5cd7aaaca49147e48411c0bd5ae701bb2194/wrapt-1.17.3-py3-none-any.whl", hash = "sha256:7171ae35d2c33d326ac19dd8facb1e82e5fd04ef8c6c0e394d7af55a55051c22", size = 23591, upload-time = "2025-08-12T05:53:20.674Z" },
]
+[[package]]
+name = "xmltodict"
+version = "1.0.2"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/6a/aa/917ceeed4dbb80d2f04dbd0c784b7ee7bba8ae5a54837ef0e5e062cd3cfb/xmltodict-1.0.2.tar.gz", hash = "sha256:54306780b7c2175a3967cad1db92f218207e5bc1aba697d887807c0fb68b7649", size = 25725, upload-time = "2025-09-17T21:59:26.459Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/c0/20/69a0e6058bc5ea74892d089d64dfc3a62ba78917ec5e2cfa70f7c92ba3a5/xmltodict-1.0.2-py3-none-any.whl", hash = "sha256:62d0fddb0dcbc9f642745d8bbf4d81fd17d6dfaec5a15b5c1876300aad92af0d", size = 13893, upload-time = "2025-09-17T21:59:24.859Z" },
+]
+
[[package]]
name = "xxhash"
version = "3.6.0"
@@ -4906,6 +5087,21 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/73/ae/b48f95715333080afb75a4504487cbe142cae1268afc482d06692d605ae6/yarl-1.22.0-py3-none-any.whl", hash = "sha256:1380560bdba02b6b6c90de54133c81c9f2a453dee9912fe58c1dcced1edb7cff", size = 46814, upload-time = "2025-10-06T14:12:53.872Z" },
]
+[[package]]
+name = "yq"
+version = "3.4.3"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "argcomplete" },
+ { name = "pyyaml" },
+ { name = "tomlkit" },
+ { name = "xmltodict" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/38/6a/eb9721ed0929d0f55d167c2222d288b529723afbef0a07ed7aa6cca72380/yq-3.4.3.tar.gz", hash = "sha256:ba586a1a6f30cf705b2f92206712df2281cd320280210e7b7b80adcb8f256e3b", size = 33214, upload-time = "2024-04-27T15:39:43.29Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/f2/ba/d1b21f3e57469030bd6536b91bb28fedd2511d4e68b5a575f2bdb3a3dbb6/yq-3.4.3-py3-none-any.whl", hash = "sha256:547e34bc3caacce83665fd3429bf7c85f8e8b6b9aaee3f953db1ad716ff3434d", size = 18812, upload-time = "2024-04-27T15:39:41.652Z" },
+]
+
[[package]]
name = "zipp"
version = "3.23.0"