v1.2.0
v1.2.0 - Exit Code Strategy & MPI Compatibility
This release introduces a breaking change to exit code handling that aligns bssh with industry-standard MPI tools like mpirun, srun, and mpiexec.
BREAKING CHANGES
Exit code behavior changed - The default exit code strategy now returns the main rank's actual exit code instead of a binary success/failure indicator.
| Version | Behavior | Use Case |
|---|---|---|
| v1.0-v1.1 | Returns 0 if all succeed, 1 if any fails | Health checks |
| v1.2.0+ (default) | Returns main rank's actual exit code | MPI workloads, CI/CD |
Migration Guide:
- MPI Workloads - ✅ No changes needed (improved behavior)
- Health Checks - Add
--require-all-successflag to preserve v1.0-v1.1 behavior
Why this change?
- Aligns with HPC and distributed computing best practices
- Preserves actual exit codes (139=SIGSEGV, 137=OOM, 124=timeout) for better diagnostics
- Enables sophisticated error handling in shell scripts and CI/CD pipelines
New Features
-
Exit Code Strategy Options
- Default: Returns main rank's exit code (matches MPI standard)
--require-all-success: Returns 0 only if all nodes succeed (v1.0-v1.1 behavior)--check-all-nodes: Hybrid mode - returns main rank code, or 1 if main OK but others failed- Automatic main rank detection via
BACKENDAI_CLUSTER_ROLEenvironment variable
-
Example Scripts
examples/mpi_exit_code.sh: Demonstrates MPI exit code handlingexamples/health_check.sh: Shows health check pattern with--require-all-success
Improvements
- Exit code behavior now matches industry-standard MPI tools (mpirun, srun, mpiexec)
- Better error diagnostics with preserved exit codes
- Enhanced CI/CD integration capabilities
- Comprehensive documentation updates in README, CHANGELOG, and ARCHITECTURE
Bug Fixes
- Fixed security-framework dependency version issue (downgraded from 3.5.1 to 2.12.1 for compatibility)
- Fixed cargo clippy warnings in test code
- Fixed environment variable handling in tests
CI/CD Improvements
- Added serial test runner for environment-dependent tests using
serial_testcrate - Improved test isolation for concurrent test execution
Technical Details
- Test Coverage: 86 comprehensive test cases covering all exit code strategies
- Files Modified: 11 files with extensive documentation updates
- Breaking Change Issue: #62
- Main Rank Detection Priority:
BACKENDAI_CLUSTER_ROLE=mainenvironment variableBACKENDAI_CLUSTER_HOSTmatching with node list- First node in the list (fallback)
Dependencies
- Updated
security-frameworkto 2.12.1 - Added
serial_test3.2 for thread-safe environment variable testing
Known Issues
None
Full Changelog
What's Changed
Full Changelog: v1.1.0...v1.2.0