-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[ENH]: add orchestrator to construct version graph for garbage collection #4463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feat-sysdb-batch-get-version-file-paths-method
Are you sure you want to change the base?
[ENH]: add orchestrator to construct version graph for garbage collection #4463
Conversation
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
if tracing::level_enabled!(tracing::Level::DEBUG) { | ||
let dot_viz = Dot::with_config(&self.graph, &[]); | ||
let encoded = BASE64_STANDARD.encode(format!("{:?}", dot_viz)); | ||
tracing::debug!(base64_encoded_dot_graph = ?encoded, "Constructed graph."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logged graph repr can be pasted into any graphviz-compatible viewer for debugging (e.g. https://graph.flyte.org)
ea31ef5
to
4d29f0d
Compare
3185a2c
to
2d8c1b7
Compare
rust/garbage_collector/src/construct_version_graph_orchestrator.rs
Outdated
Show resolved
Hide resolved
rust/garbage_collector/src/construct_version_graph_orchestrator.rs
Outdated
Show resolved
Hide resolved
rust/garbage_collector/src/construct_version_graph_orchestrator.rs
Outdated
Show resolved
Hide resolved
rust/garbage_collector/src/construct_version_graph_orchestrator.rs
Outdated
Show resolved
Hide resolved
0d136b9
to
6ad39d0
Compare
2d8c1b7
to
fafe998
Compare
Add Orchestrator for Constructing Collection Version Graph for GC This PR introduces a new orchestrator subsystem in the Rust garbage collector service that constructs a version graph representing versions and dependencies across all collections in a collection fork tree. This graph structure enables more robust and accurate garbage collection by tracking version histories and lineage dependencies, leveraging Petgraph for graph representation and the new fetch_version_file, fetch_lineage_file, and get_version_file_paths operators. The orchestrator coordinates asynchronous tasks to fetch version and lineage files from storage and sysdb, handling errors, missing data, and generating a directed acyclic graph suitable for GC analysis, with thorough integration and unit tests for various graph topologies. Key Changes: Affected Areas: Potential Impact: Functionality: Introduces a new capability to gather and represent complete collection/version dependency state for improved garbage collection logic; affects how future GC tasks understand which data is safe to delete. Performance: Performs multiple storage and sysdb I/O tasks; may increase job execution time for complex collection trees, but uses async patterns. Security: No new network or privilege escalation risks introduced, but ensures UUID validation and error propagation for malformed/missing lineage/version files. Scalability: Designed to build graphs for fork trees of arbitrary size; Petgraph usage should scale acceptably unless the collection graph is extremely large. Review Focus: Testing Needed• Run new and existing orchestrator and operator tests ( Code Quality Assessmentrust/garbage_collector/src/operators/fetch_version_file.rs: Refactored for clarity and stronger typing; structured error handling and output. rust/garbage_collector/src/operators/fetch_lineage_file.rs: Clean, idiomatic, and easy to follow. rust/garbage_collector/src/operators/get_version_file_paths.rs: Straightforward, adheres to async/trait design. Cargo. and TOML files*: Updated dependencies accurately. rust/garbage_collector/src/construct_version_graph_orchestrator.rs: Well-structured, modular, with clearly documented error handling and tracing, but reviewer feedback suggests possible node reuse optimizations and comments needed for initial version semantics. Best PracticesModularity: Error Handling: Testing: Async Patterns: Potential Issues• Possible redundant node creations in Petgraph (see review), may impact very large version/lineage sets. This summary was automatically generated by @propel-code-bot |
rust/garbage_collector/src/construct_version_graph_orchestrator.rs
Outdated
Show resolved
Hide resolved
rust/garbage_collector/src/construct_version_graph_orchestrator.rs
Outdated
Show resolved
Hide resolved
6ad39d0
to
93be17d
Compare
fafe998
to
ef973c9
Compare
93be17d
to
ea366a5
Compare
ef973c9
to
bdc98f5
Compare
ea366a5
to
edf6743
Compare
bdc98f5
to
8137f4d
Compare
8137f4d
to
14eb5e7
Compare
14eb5e7
to
721349a
Compare
edf6743
to
85adb8c
Compare
721349a
to
8f8a50b
Compare
85adb8c
to
372edaf
Compare
8f8a50b
to
45253b9
Compare
372edaf
to
505c4ae
Compare
45253b9
to
64df4b0
Compare
505c4ae
to
3ece42b
Compare
64df4b0
to
6e6f97f
Compare
3ece42b
to
5801be6
Compare
6e6f97f
to
d62f117
Compare
5801be6
to
90ac80a
Compare
d62f117
to
cb3e44a
Compare
Description of changes
Adds an orchestrator to construct the version graph for all collections in a fork tree to be used by garbage collection.
Test plan
How are these changes tested?
pytest
for python,yarn test
for js,cargo test
for rustAdded tests for new orchestrator.
Documentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?
n/a