misc: split ps.py file into multiple files#64
Closed
specture724 wants to merge 8 commits intoMoonshotAI:mainfrom
Closed
misc: split ps.py file into multiple files#64specture724 wants to merge 8 commits intoMoonshotAI:mainfrom
ps.py file into multiple files#64specture724 wants to merge 8 commits intoMoonshotAI:mainfrom
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR refactors the large ps.py file into multiple smaller, more focused modules to improve code organization and maintainability. The core functionality remains unchanged; the refactoring only reorganizes existing code into logical units based on their responsibilities.
Key changes:
- Extracted RDMA device handling and P2P store functionality into
p2p_store.py - Moved all Pydantic data models and type definitions into
data_types.py - Separated checkpoint loading and pin memory operations into
pin_memory.py - Isolated FastAPI endpoints and HTTP communication into
api.py - Created
__main__.pyas the CLI entry point - Updated
__init__.pyto expose all public interfaces
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
checkpoint_engine/data_types.py |
New file containing all Pydantic models and type definitions (ParameterMeta, BucketRange, H2DBucket, etc.) with validators and serializers |
checkpoint_engine/p2p_store.py |
New file with RDMA device detection/parsing functions and the P2PStore class for managing peer-to-peer memory transfers |
checkpoint_engine/pin_memory.py |
New file with checkpoint loading functions (_load_checkpoint_file, _register_checkpoint) and memory pinning operations |
checkpoint_engine/api.py |
New file containing FastAPI application initialization (_init_api) and HTTP endpoint definitions for checkpoint management |
checkpoint_engine/__main__.py |
New file with CLI entry point (run_from_cli) for starting the parameter server via uvicorn |
checkpoint_engine/__init__.py |
Updated to export all public interfaces from the new modules with explicit all list |
checkpoint_engine/ps.py |
Significantly reduced file that now imports from the new modules and focuses solely on ParameterServer class implementation |
tests/test_rdma_parser.py |
Updated imports and patch paths to reference the new checkpoint_engine.p2p_store module location |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
008a747 to
a9f3642
Compare
093f569 to
c4a1e7c
Compare
4f1728a to
b0b3f93
Compare
Collaborator
|
merged in #75 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Most of the functions and classes are in
ps.py, making it so large. We need to split it.p2p_store.pydata_types.pypin_memory.pyapi.pyrun_from_clito__main__.py__init__.py