Skip to content

Conversation

@gwarmstrong
Copy link
Collaborator

@gwarmstrong gwarmstrong commented Nov 19, 2025

Overview

  • Introduced prepare_cluster_config_for_test to slurm tests, a shared helper responsible for:
    • Loading a cluster config, resolving workspace mounts, and forcing job_dir into {workspace}/nemo-run-experiments.
      • This makes it easier to investigate the generate sbatch files pertaining to the specific test
    • Creating snapshots (cluster_config.yaml + nemo_skills_commit.json) at the workspace root so every test run records its launcher configuration and NeMo-Skills git metadata.
  • Centralized shared CLI parameters across tests with add_common_args, removing duplicated argparse boilerplate for --workspace, --cluster, --expname_prefix, --wandb_project, and --cluster_config_mode.

Snapshot Layout per Test Workspace

<workspace>/                             # e.g., /lustre/.../gpt_oss_python_aime25
├─ cluster_config.yaml                   # Full config snapshot used for this test run
├─ nemo_skills_commit.json               # Git metadata of the launcher checkout
├─ nemo-run-experiments/                 # Standard nemo-run job_dir root
│  └─ ...                                # Server/eval/check job folders
├─ check-results-logs/                   # Per-test log dir (existing behavior)
├─ other test-specific outputs …         # Eval outputs, cached assets, etc.

Summary by CodeRabbit

  • Tests
    • Refactored test infrastructure for improved consistency and maintainability across evaluation test suites.
    • Introduced centralized cluster configuration handling for test environments.
    • Added shared test utilities to standardize argument parsing and configuration management.
    • Enhanced test orchestration with improved setup sequencing and cluster validation.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 4, 2025

📝 Walkthrough

Walkthrough

The PR refactors Slurm test infrastructure to centralize cluster configuration handling. New utility functions (prepare_cluster_config_for_test, add_common_args) are added to tests/slurm-tests/utils.py to validate and prepare cluster configurations. Multiple test scripts are updated to use these utilities and pass prepared cluster objects instead of raw cluster strings. A --cluster_config_mode flag is added to run_all.sh for broader control.

Changes

Cohort / File(s) Summary
Cluster Configuration Utilities
tests/slurm-tests/utils.py
Added prepare_cluster_config_for_test() to validate/snapshot cluster configs and resolve job directories; added add_common_args() for centralized CLI argument registration; introduced internal helpers for remote YAML/JSON I/O, SSH tunneling, snapshot management, and soft-assert infrastructure.
Test Script Refactoring
tests/slurm-tests/gpt_oss_python_aime25/run_test.py, tests/slurm-tests/omr_simple_recipe/run_test.py, tests/slurm-tests/qwen3_4b_evals/run_test.py, tests/slurm-tests/qwen3coder_30b_swebench/run_test.py, tests/slurm-tests/super_49b_evals/run_test.py
Replaced explicit argument definitions with add_common_args(); added path setup for utility imports; compute cluster config object via prepare_cluster_config_for_test(); updated all evaluation and command invocations to pass prepared cluster object instead of raw cluster string.
Test Orchestration
tests/slurm-tests/run_all.sh
Added parsing of optional --cluster_config_mode flag (default: 'assert'); propagated flag to all test invocations.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

  • tests/slurm-tests/utils.py — New utility functions introduce substantial logic: SSH tunneling, remote YAML/JSON handling, snapshot management, path resolution, and soft-assert infrastructure requiring careful validation of I/O operations and error handling.
  • Test scripts refactoring — Although the pattern is repetitive across multiple files, each applies the refactoring consistently; verify that all cluster object computations and usage sites are correctly aligned.
  • run_all.sh — Flag propagation is straightforward but verify all test invocations correctly receive and handle the new --cluster_config_mode parameter.

Possibly related PRs

Suggested labels

run GPU tests

Suggested reviewers

  • Kipok

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Improve Slurm Testing Reproducibility' accurately summarizes the main change: adding cluster config snapshots and centralizing parameters to enhance test reproducibility.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch georgea/improve-slurm-testing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
tests/slurm-tests/run_all.sh (1)

3-4: Remove redundant initial assignments.

Lines 3-4 assign CLUSTER and RUN_NAME from positional parameters, but these are immediately reassigned at lines 25-26 after the argument parsing loop. The initial assignments are never used.

-CLUSTER=$1
-RUN_NAME=${2:-$(date +%Y-%m-%d)}
 # Parse --cluster_config_mode flag with default 'assert'
 CLUSTER_CONFIG_MODE="assert"
 POSITIONAL_ARGS=()
tests/slurm-tests/utils.py (3)

184-192: Clarify redundant job_dir assignment.

Line 191 appears redundant. In the if branch (lines 184-186), ssh_tunnel.job_dir is set but cluster_config["job_dir"] might not exist. In the else branch (lines 187-189), cluster_config["job_dir"] is explicitly set. Then line 191 uses get(..., test_job_dir) which would only do something if neither branch was taken—but one always is.

Consider simplifying:

     if "ssh_tunnel" in cluster_config:
         cluster_config["ssh_tunnel"]["job_dir"] = test_job_dir
         job_dir = cluster_config["ssh_tunnel"]["job_dir"]
     else:
         cluster_config["job_dir"] = test_job_dir
         job_dir = cluster_config["job_dir"]
 
-    cluster_config["job_dir"] = cluster_config.get("job_dir", test_job_dir)
+    # Ensure top-level job_dir is always set for consistency
+    if "job_dir" not in cluster_config:
+        cluster_config["job_dir"] = test_job_dir
     _resolve_container_image_paths(cluster_config)

228-238: Consider narrowing the exception type.

The bare Exception catch (line 234) is intentional for graceful fallback, but could mask unexpected errors. Consider catching more specific exceptions or at minimum logging the exception for debugging.

     try:
         tunnel = get_tunnel(cluster_config)
         result = tunnel.run(f"readlink -f {shlex.quote(path)}", hide=True, warn=True)
         resolved_remote = result.stdout.strip() if result.exited == 0 else ""
         return resolved_remote or local_resolved
-    except Exception:
+    except (OSError, IOError, RuntimeError) as exc:
+        # Log for debugging but continue with local fallback
         return local_resolved
     finally:
         if tunnel is not None:
             tunnel.cleanup()

365-390: Consider using non-deprecated datetime API.

datetime.utcnow() is deprecated since Python 3.12 in favor of timezone-aware alternatives.

+from datetime import datetime, timezone
+
 def _collect_repo_metadata() -> dict:
     """Gather information about the current NeMo-Skills checkout."""
     repo_root = _get_repo_root()
     metadata = {
         "repo_root": str(repo_root),
-        "timestamp_utc": datetime.utcnow().isoformat(timespec="seconds") + "Z",
+        "timestamp_utc": datetime.now(timezone.utc).isoformat(timespec="seconds").replace("+00:00", "Z"),
     }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5cc1dcc and be77de0.

📒 Files selected for processing (7)
  • tests/slurm-tests/gpt_oss_python_aime25/run_test.py (3 hunks)
  • tests/slurm-tests/omr_simple_recipe/run_test.py (3 hunks)
  • tests/slurm-tests/qwen3_4b_evals/run_test.py (3 hunks)
  • tests/slurm-tests/qwen3coder_30b_swebench/run_test.py (4 hunks)
  • tests/slurm-tests/run_all.sh (1 hunks)
  • tests/slurm-tests/super_49b_evals/run_test.py (3 hunks)
  • tests/slurm-tests/utils.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
tests/slurm-tests/gpt_oss_python_aime25/run_test.py (2)
tests/slurm-tests/utils.py (2)
  • add_common_args (287-319)
  • prepare_cluster_config_for_test (118-201)
nemo_skills/pipeline/cli.py (1)
  • wrap_arguments (43-52)
tests/slurm-tests/qwen3_4b_evals/run_test.py (2)
tests/slurm-tests/utils.py (2)
  • add_common_args (287-319)
  • prepare_cluster_config_for_test (118-201)
nemo_skills/pipeline/cli.py (1)
  • wrap_arguments (43-52)
tests/slurm-tests/qwen3coder_30b_swebench/run_test.py (1)
tests/slurm-tests/utils.py (2)
  • add_common_args (287-319)
  • prepare_cluster_config_for_test (118-201)
tests/slurm-tests/omr_simple_recipe/run_test.py (2)
tests/slurm-tests/utils.py (2)
  • add_common_args (287-319)
  • prepare_cluster_config_for_test (118-201)
nemo_skills/pipeline/cli.py (1)
  • wrap_arguments (43-52)
tests/slurm-tests/super_49b_evals/run_test.py (2)
tests/slurm-tests/utils.py (2)
  • add_common_args (287-319)
  • prepare_cluster_config_for_test (118-201)
nemo_skills/pipeline/cli.py (1)
  • wrap_arguments (43-52)
tests/slurm-tests/utils.py (3)
nemo_skills/pipeline/utils/cluster.py (5)
  • cluster_download_file (480-482)
  • cluster_path_exists (485-488)
  • cluster_upload (565-579)
  • get_cluster_config (314-368)
  • get_tunnel (451-456)
nemo_skills/pipeline/utils/mounts.py (1)
  • get_mounts_from_config (399-455)
nemo_skills/pipeline/utils/exp.py (1)
  • stdout (124-125)
🪛 Ruff (0.14.7)
tests/slurm-tests/utils.py

61-61: Starting a process with a partial executable path

(S607)


153-156: Avoid specifying long messages outside the exception class

(TRY003)


233-233: Consider moving this statement to an else block

(TRY300)


234-234: Do not catch blind exception: Exception

(BLE001)


258-260: Avoid specifying long messages outside the exception class

(TRY003)


263-263: Avoid specifying long messages outside the exception class

(TRY003)


275-278: Avoid specifying long messages outside the exception class

(TRY003)


348-348: Avoid specifying long messages outside the exception class

(TRY003)


356-359: Avoid specifying long messages outside the exception class

(TRY003)


374-374: subprocess call: check for execution of untrusted input

(S603)


375-375: Starting a process with a partial executable path

(S607)

🔇 Additional comments (13)
tests/slurm-tests/run_all.sh (1)

9-39: LGTM!

The argument parsing loop correctly handles the optional --cluster_config_mode flag while preserving positional parameters, and the flag is consistently propagated to all test invocations.

tests/slurm-tests/omr_simple_recipe/run_test.py (2)

16-21: LGTM on the import structure.

The sys.path manipulation to import from the parent directory is consistent with other test scripts in this PR.


39-59: Cluster config inconsistency: subprocess bypasses job_dir modifications

The cluster object prepared via prepare_cluster_config_for_test (which sets job_dir to {workspace}/nemo-run-experiments) is only applied to the run_cmd call on line 36-43, but the simplified_recipe subprocess invocation on line 30 passes args.cluster (raw cluster name/path). This means all experiments launched by simplified_recipe will use the default cluster config with the original job_dir, causing experiment artifacts to be stored outside the test workspace—only the check_results job will use the modified config.

Either pass the prepared cluster config to simplified_recipe (e.g., via a saved config file or new command-line argument) or clarify if this inconsistency is intentional.

tests/slurm-tests/super_49b_evals/run_test.py (2)

16-21: LGTM on centralized imports.

The import pattern is consistent with other test files in this PR.


319-361: LGTM on cluster object propagation.

The prepared cluster object is correctly passed through to setup(), eval_reasoning_on(), eval_reasoning_off(), and the final run_cmd() call, ensuring consistent use of the modified cluster configuration throughout the test.

tests/slurm-tests/qwen3_4b_evals/run_test.py (1)

142-188: LGTM!

The cluster configuration is correctly prepared and consistently passed to all evaluation functions (eval_qwen3_bfcl, eval_qwen3_online_genselect, eval_qwen3_offline_genselect) and the final run_cmd call.

tests/slurm-tests/qwen3coder_30b_swebench/run_test.py (1)

51-96: LGTM!

The cluster object is correctly prepared and passed to both eval_qwen3coder and run_cmd within the loop. The local workspace variable shadowing args.workspace at line 72 is intentional to create agent-specific workspace paths.

tests/slurm-tests/gpt_oss_python_aime25/run_test.py (1)

57-88: LGTM!

The refactoring correctly uses add_common_args, prepares the cluster configuration, and consistently passes the cluster object to eval_gpt_oss_python and run_cmd.

tests/slurm-tests/utils.py (5)

15-49: LGTM on imports and constants.

The imports are appropriate for the functionality, and the constants clearly define the supported modes and provide helpful error messaging for uncommitted changes.


56-72: LGTM on repository root detection.

The lru_cache is appropriate since the repo root won't change during execution. Using git without a full path is acceptable as it's expected to be in PATH on systems running these tests.


241-284: LGTM on snapshot synchronization logic.

The three-mode handling (reuse, assert, overwrite) is implemented correctly:

  • reuse downloads and validates existing config
  • assert with existing config verifies equality
  • assert without existing (or overwrite) uploads the new config

The implicit creation of a new snapshot when none exists in assert mode is sensible behavior.


287-319: LGTM on common argument registration.

The helper cleanly consolidates shared CLI arguments across test entrypoints. The optional include_wandb parameter provides flexibility for tests that don't use W&B.


393-432: LGTM on remote file operations.

The temp file handling with try/finally cleanup is correct. Using yaml.safe_load/safe_dump is the right choice for security.

Copy link
Collaborator

@activatedgeek activatedgeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Added some comments for consideration, and some simplifications of potentially redundant args.



def _upload_json(cluster_config: dict, data: dict, remote_path: str):
with tempfile.NamedTemporaryFile(mode="wt", encoding="utf-8", delete=False) as tmp:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't really need the explicit encoding argument since it is always default in Python 3?



def _download_remote_yaml(cluster_config: dict, remote_path: str) -> dict:
with tempfile.NamedTemporaryFile(delete=False) as tmp:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that the reason for operating on the tempfile outside the context is to allow separate open calls. If that is the case, tempfile.mkstemp would be the more appropriate and less laborious option.

from pathlib import Path

# Add parent directory to path to import utils
sys.path.insert(0, str(Path(__file__).parents[1]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consideration: If there are common utilities, then perhaps we can move them under nemo_skills itself, and avoid the ugly sys.path.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 6, 2026

Greptile Summary

Refactored Slurm test infrastructure to improve reproducibility by introducing centralized cluster configuration management.

  • Created tests/slurm-tests/utils.py with shared utilities: prepare_cluster_config_for_test (handles cluster config loading, workspace mount resolution, and job_dir override) and add_common_args (centralizes CLI argument parsing)
  • Each test workspace now receives snapshots (cluster_config.yaml + nemo_skills_commit.json) at the workspace root to record launcher configuration and git metadata
  • All test scripts refactored to use shared utilities, eliminating duplicated argparse boilerplate
  • Added --cluster_config_mode flag (assert/overwrite/reuse) to control snapshot behavior
  • Forces job_dir into {workspace}/nemo-run-experiments for easier correlation between test runs and generated sbatch files

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The refactoring follows a consistent pattern across all test files, eliminates code duplication, and improves maintainability. The new utilities are well-documented with clear docstrings. The only minor issue is a potentially redundant line in utils.py (line 191) that doesn't affect functionality. All test files have been updated consistently, and the run_all.sh script properly propagates the new flag.
  • No files require special attention

Important Files Changed

Filename Overview
tests/slurm-tests/utils.py Adds shared test utilities for cluster config management and snapshots. Contains one potential redundancy in job_dir handling (line 191).
tests/slurm-tests/gpt_oss_python_aime25/run_test.py Refactored to use shared test utilities. Clean integration with prepare_cluster_config_for_test and add_common_args.
tests/slurm-tests/qwen3_4b_evals/run_test.py Refactored to use shared test utilities. Clean integration with proper cluster config handling.
tests/slurm-tests/omr_simple_recipe/run_test.py Refactored to use shared utilities. Correctly uses args.cluster string for subprocess and cluster dict for nemo_skills APIs.
tests/slurm-tests/run_all.sh Added --cluster_config_mode flag support with proper argument parsing and propagation to all test scripts.

Sequence Diagram

sequenceDiagram
    participant Test as Test Script (run_test.py)
    participant Utils as utils.prepare_cluster_config_for_test
    participant Cluster as Cluster/Remote
    participant NemoSkills as nemo_skills.pipeline

    Test->>Utils: prepare_cluster_config_for_test(cluster, workspace, mode)
    Utils->>NemoSkills: get_cluster_config(cluster)
    NemoSkills-->>Utils: cluster_config dict
    Utils->>Utils: Resolve workspace mount path to source
    Utils->>Utils: Override job_dir = {workspace}/nemo-run-experiments
    Utils->>Utils: Resolve container image symlinks
    Utils->>Utils: Collect git metadata (commit, status)
    
    alt mode == "reuse"
        Utils->>Cluster: Check for existing cluster_config.yaml
        Cluster-->>Utils: Download existing config
        Utils-->>Test: Return persisted config
    else mode == "assert"
        Utils->>Cluster: Check for existing cluster_config.yaml
        alt exists
            Cluster-->>Utils: Download existing config
            Utils->>Utils: Compare existing vs new config
            alt configs match
                Utils-->>Test: Return config
            else configs don't match
                Utils-->>Test: Raise AssertionError
            end
        else not exists
            Utils->>Cluster: Upload cluster_config.yaml + nemo_skills_commit.json
            Utils-->>Test: Return new config
        end
    else mode == "overwrite"
        Utils->>Cluster: Upload cluster_config.yaml + nemo_skills_commit.json
        Utils-->>Test: Return new config
    end
    
    Test->>NemoSkills: eval/run_cmd with modified cluster config
    NemoSkills->>Cluster: Submit jobs to {workspace}/nemo-run-experiments
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

cluster_config["job_dir"] = test_job_dir
job_dir = cluster_config["job_dir"]

cluster_config["job_dir"] = cluster_config.get("job_dir", test_job_dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Redundant line - job_dir was already set in lines 184-189. When ssh_tunnel exists, this adds job_dir at root level even though it was only set in ssh_tunnel. Consider removing this line.

Suggested change
cluster_config["job_dir"] = cluster_config.get("job_dir", test_job_dir)
# job_dir already set above, no need to set again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants