Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR refactors the test_update.py to use pytest framework and adds CI workflow capabilities. The main change is replacing the direct test execution with a pytest-based test structure that uses subprocess to run distributed training via torchrun.
- Converted test_update.py from standalone script to pytest-based test structure
- Added pytest mark for GPU-specific tests with multi-GPU requirements
- Enhanced logging control in ParameterServer to support flexible rank-based logging
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tests/test_update.py | Refactored from standalone script to pytest framework with subprocess-based distributed test execution |
| checkpoint_engine/ps.py | Enhanced logging system to support configurable logger rank for different update scenarios |
| .pre-commit-config.yaml | Added ruff ignore rule for subprocess security warning (S603) |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| assert world_size >= 2, "This test requires at least 2 GPUs." | ||
|
|
||
| master_addr = "localhost" | ||
| master_port = random.randint(20000, 30000) |
There was a problem hiding this comment.
Using random port selection for distributed training can lead to port conflicts in CI environments. Consider using a more robust port selection mechanism or environment variable override.
| self._logger_rank = 0 | ||
|
|
There was a problem hiding this comment.
The _logger_rank initialization should be documented to explain when and how this value changes during execution.
| self._logger_rank = 0 | |
| # _logger_rank determines which rank is responsible for logging output. | |
| # By default, it is set to 0, meaning only rank 0 will perform logging. | |
| # If logging from other ranks is required, this value can be changed accordingly. | |
| self._logger_rank = 0 |
Refactor test_update.py to pytest style and create a workflow for main branch