-
Notifications
You must be signed in to change notification settings - Fork 2k
[ENH]: add orchestrator to construct version graph for garbage collection #4463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
| if tracing::level_enabled!(tracing::Level::DEBUG) { | ||
| let dot_viz = Dot::with_config(&self.graph, &[]); | ||
| let encoded = BASE64_STANDARD.encode(format!("{:?}", dot_viz)); | ||
| tracing::debug!(base64_encoded_dot_graph = ?encoded, "Constructed graph."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logged graph repr can be pasted into any graphviz-compatible viewer for debugging (e.g. https://graph.flyte.org)
ea31ef5 to
4d29f0d
Compare
3185a2c to
2d8c1b7
Compare
rust/garbage_collector/src/construct_version_graph_orchestrator.rs
Outdated
Show resolved
Hide resolved
rust/garbage_collector/src/construct_version_graph_orchestrator.rs
Outdated
Show resolved
Hide resolved
rust/garbage_collector/src/construct_version_graph_orchestrator.rs
Outdated
Show resolved
Hide resolved
rust/garbage_collector/src/construct_version_graph_orchestrator.rs
Outdated
Show resolved
Hide resolved
0d136b9 to
6ad39d0
Compare
2d8c1b7 to
fafe998
Compare
|
Add Orchestrator for Building Collection Version Graphs for GC This PR introduces a new orchestrator (ConstructVersionGraphOrchestrator) for constructing a version dependency graph across all collections in a fork tree, to support garbage collection workflows. Supporting Rust operators to fetch lineage files, fetch version files, and batch-fetch version file paths are added and integrated, alongside updates to relevant orchestrator logic and supporting changes to the garbage collector pipeline, storage API, and dependencies. Extensive test coverage for the graph construction is included, validating both simple and complex collection/version lineage cases. Key Changes: Affected Areas: Potential Impact: Functionality: Enables garbage collection logic to operate on entire version/fork trees rather than a single collection. Improves ability to trace dependencies and perform accurate collection/variant cleanup. Performance: Slight increase in orchestrator complexity; batched fetching of version files may help performance. Additional in-memory graph processing is limited by number of collections in a fork. Security: No new security risks introduced; new code inherits data access/authorization from existing storage and sysdb layers. Scalability: Graph-based approach scales to arbitrary fork trees; performance may need tuning for very large trees but core approach is scalable. Review Focus: Testing Needed• Run all Code Quality Assessmentrust/garbage_collector/src/operators/get_version_file_paths.rs: Simple, direct batch fetching and error propagation. rust/garbage_collector/src/garbage_collector_orchestrator.rs: Updated to match new FetchVersionFileOutput interface; residual commented-out code was removed. rust/storage/src/lib.rs: Added Debug implementation for Storage; otherwise minimal change. Cargo files: Dependency updates are precise and necessary for new features. rust/garbage_collector/src/construct_version_graph_orchestrator.rs: Well-structured; uses async patterns, clear error enums, and trait-based orchestrator integration. Large, so future decomposition may help maintainability. rust/garbage_collector/src/operators/fetch_version_file.rs: Refactored for new output types; improved error reporting. Debug implementations and API patterns follow Rust conventions. rust/garbage_collector/src/operators/fetch_lineage_file.rs: Clean, idiomatic, covers code and decode paths. Good use of error enums. Best PracticesDependency Management: Documentation: Error Handling: Code Modularity: Testing: Potential Issues• If collection or version lineage is partially missing, logic may terminate with error or skip nodes; correctness under partial data should be monitored. This summary was automatically generated by @propel-code-bot |
rust/garbage_collector/src/construct_version_graph_orchestrator.rs
Outdated
Show resolved
Hide resolved
rust/garbage_collector/src/construct_version_graph_orchestrator.rs
Outdated
Show resolved
Hide resolved
6ad39d0 to
93be17d
Compare
fafe998 to
ef973c9
Compare
93be17d to
ea366a5
Compare
ef973c9 to
bdc98f5
Compare
ea366a5 to
edf6743
Compare
bdc98f5 to
8137f4d
Compare
8137f4d to
14eb5e7
Compare
5801be6 to
90ac80a
Compare
d62f117 to
cb3e44a
Compare
90ac80a to
2e26e8a
Compare
cb3e44a to
1242760
Compare
7344e5c to
c3a7502
Compare
1242760 to
1a374a3
Compare
c3a7502 to
2425e04
Compare
1a374a3 to
f4a6ed4
Compare
2425e04 to
a09392f
Compare
f4a6ed4 to
cdff65d
Compare
| let output = match self.ok_or_terminate(message.into_inner(), ctx).await { | ||
| Some(output) => output, | ||
| None => { | ||
| return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should also tracing::error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok_or_terminate() will log any error
a09392f to
1af74ec
Compare
cdff65d to
3ec6e51
Compare
3ec6e51 to
797d3f7
Compare
1af74ec to
2f4ce46
Compare
797d3f7 to
7591c20
Compare
Merge activity
|
…tion (chroma-core#4463) ## Description of changes Adds an orchestrator to construct the version graph for all collections in a fork tree to be used by garbage collection. ## Test plan _How are these changes tested?_ - [x] Tests pass locally with `pytest` for python, `yarn test` for js, `cargo test` for rust Added tests for new orchestrator. ## Documentation Changes _Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the [docs section](https://github.com/chroma-core/chroma/tree/main/docs/docs.trychroma.com)?_ n/a

Description of changes
Adds an orchestrator to construct the version graph for all collections in a fork tree to be used by garbage collection.
Test plan
How are these changes tested?
pytestfor python,yarn testfor js,cargo testfor rustAdded tests for new orchestrator.
Documentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?
n/a