Skip to content

v1.2.0

Choose a tag to compare

@inureyes inureyes released this 27 Oct 14:06
· 146 commits to main since this release

v1.2.0 - Exit Code Strategy & MPI Compatibility

This release introduces a breaking change to exit code handling that aligns bssh with industry-standard MPI tools like mpirun, srun, and mpiexec.

BREAKING CHANGES

Exit code behavior changed - The default exit code strategy now returns the main rank's actual exit code instead of a binary success/failure indicator.

Version Behavior Use Case
v1.0-v1.1 Returns 0 if all succeed, 1 if any fails Health checks
v1.2.0+ (default) Returns main rank's actual exit code MPI workloads, CI/CD

Migration Guide:

  • MPI Workloads - ✅ No changes needed (improved behavior)
  • Health Checks - Add --require-all-success flag to preserve v1.0-v1.1 behavior

Why this change?

  • Aligns with HPC and distributed computing best practices
  • Preserves actual exit codes (139=SIGSEGV, 137=OOM, 124=timeout) for better diagnostics
  • Enables sophisticated error handling in shell scripts and CI/CD pipelines

New Features

  • Exit Code Strategy Options

    • Default: Returns main rank's exit code (matches MPI standard)
    • --require-all-success: Returns 0 only if all nodes succeed (v1.0-v1.1 behavior)
    • --check-all-nodes: Hybrid mode - returns main rank code, or 1 if main OK but others failed
    • Automatic main rank detection via BACKENDAI_CLUSTER_ROLE environment variable
  • Example Scripts

    • examples/mpi_exit_code.sh: Demonstrates MPI exit code handling
    • examples/health_check.sh: Shows health check pattern with --require-all-success

Improvements

  • Exit code behavior now matches industry-standard MPI tools (mpirun, srun, mpiexec)
  • Better error diagnostics with preserved exit codes
  • Enhanced CI/CD integration capabilities
  • Comprehensive documentation updates in README, CHANGELOG, and ARCHITECTURE

Bug Fixes

  • Fixed security-framework dependency version issue (downgraded from 3.5.1 to 2.12.1 for compatibility)
  • Fixed cargo clippy warnings in test code
  • Fixed environment variable handling in tests

CI/CD Improvements

  • Added serial test runner for environment-dependent tests using serial_test crate
  • Improved test isolation for concurrent test execution

Technical Details

  • Test Coverage: 86 comprehensive test cases covering all exit code strategies
  • Files Modified: 11 files with extensive documentation updates
  • Breaking Change Issue: #62
  • Main Rank Detection Priority:
    1. BACKENDAI_CLUSTER_ROLE=main environment variable
    2. BACKENDAI_CLUSTER_HOST matching with node list
    3. First node in the list (fallback)

Dependencies

  • Updated security-framework to 2.12.1
  • Added serial_test 3.2 for thread-safe environment variable testing

Known Issues

None

Full Changelog

v1.1.0...v1.2.0

What's Changed

  • feat: Return main rank exit code by default (v1.2.0 BREAKING CHANGE) by @inureyes in #63

Full Changelog: v1.1.0...v1.2.0