-
Notifications
You must be signed in to change notification settings - Fork 2k
[ENH]: sysdb changes to support moving collection hard deletes to garbage collector #4607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
|
|
| // SetDeleteMode sets the delete mode for testing | ||
| func (c *Coordinator) SetDeleteMode(mode DeleteMode) { | ||
| c.deleteMode = mode | ||
| func (s *Coordinator) BatchGetCollectionSoftDeleteStatus(ctx context.Context, req *coordinatorpb.BatchGetCollectionSoftDeleteStatusRequest) (*coordinatorpb.BatchGetCollectionSoftDeleteStatusResponse, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review: should we instead extend the GetCollection method to allow returning soft deleted collections, and then get the statuses with multiple calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer a single call to reduce the roundtrips. GetCollection should not return soft deleted collections for its current usages.
afe5084 to
9aed3e4
Compare
c570721 to
bcbf758
Compare
9aed3e4 to
ca845a3
Compare
|
Move Collection Hard Delete Responsibility from sysdb to Garbage Collector, Always Enable Soft Delete This PR refactors the sysdb to always operate in soft-delete mode (removing the previous flag and associated configuration), and shifts the responsibility for hard deletion of collections from the sysdb's background cleaner to the garbage collector. The soft deletion remains immediate upon request, but a new explicit API ( Key Changes: Affected Areas: Potential Impact: Functionality: Removes risk of accidental hard deletion and makes the deletion lifecycle explicit and more controllable. Upgrades API surface by providing new batch query and explicit permanent deletion methods. Performance: Background cleanup logic is removed; no ongoing DB cleanups from sysdb. Deletion batch operations are more efficient than multiple single queries. Security: Explicit hard deletes may be easier to audit; potential window post-soft delete where data is still recoverable up to GC. Scalability: Decoupling hard deletes from sysdb daemon should make deletion scale better and offer more flexible cleanup strategies (GC-managed); batch APIs improve efficiency. Review Focus: Testing Needed• Verify soft delete works for all collection types and is always enabled. Code Quality Assessmentgo/pkg/sysdb/coordinator/coordinator.go: Much simplified, with soft/hard delete modes removed and API surface clarified. go/pkg/sysdb/grpc/collection_service.go: Now exposes explicit FinishCollectionDeletion endpoint; batch status methods correctly implemented. go/pkg/sysdb/metastore/db/dao/collection.go: Batch status methods could use a pre-check for empty input for DB efficiency. idl/chromadb/proto/coordinator.proto: API changes are clear; new methods/fields documented in code. go/pkg/sysdb/metastore/s3/impl.go: Improved robustness in version history checks; further guard checks for nil recommended. go/pkg/common/errors.go: New error codes for API clarity added. tests and mock files: All tests and mocks updated for new behaviors; legacy tests removed as appropriate. go/pkg/sysdb/coordinator/table_catalog.go: Well-structured, good use of error handling, but lock ordering in deletion requires careful scrutiny for deadlock. Best PracticesAPI Design: Data Consistency: Testing/Regression: Code Robustness: Potential Issues• If clients depend on immediate hard deletion or the soft/hard delete mode toggling, this is a breaking change and may require coordination. This summary was automatically generated by @propel-code-bot |
bcbf758 to
f72bd63
Compare
ca845a3 to
ccc9b82
Compare
f72bd63 to
c62bf3f
Compare
ccc9b82 to
4dcc3a4
Compare
c62bf3f to
d6bf939
Compare
4dcc3a4 to
f2adff1
Compare
d6bf939 to
e0a0f88
Compare
f2adff1 to
660013f
Compare
3cd7274 to
8f20043
Compare
716f29a to
9356e10
Compare
8f20043 to
4a3b406
Compare
9356e10 to
8253c18
Compare
4a3b406 to
e533b0c
Compare
8253c18 to
e1a6311
Compare
| return err | ||
| } | ||
| if rootCollection == nil { | ||
| return errors.New("root collection not found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think there is an error constant for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did not see one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in ForkCollection we directly used common.ErrCollectionNotFound. Maybe we should have a separate error constant for this. I'm ok with this now
| // Remove collection being deleted from the dependencies | ||
| updatedDependencies := make([]*coordinatorpb.CollectionVersionDependency, 0) | ||
| for _, dependency := range lineageFile.Dependencies { | ||
| if dependency.TargetCollectionId != deleteCollection.ID.String() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens when dependency.SourceCollectionId == deleteCollection.ID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dependency.SourceCollectionId == deleteCollection.ID should never evaluate to true at this point because only leaf nodes in the dependency tree are eligible for hard deletion
but this would be good to add as an invariant 👍
e1a6311 to
c85ee20
Compare
e533b0c to
3baca7c
Compare
b2befca to
950e307
Compare
950e307 to
d1c4e49
Compare
Merge activity
|
…bage collector (chroma-core#4607) ## Description of changes - sysdb no longer has a flag for soft delete mode, soft delete is always enabled - sysdb no longer automatically transitions soft deleted collections to hard deleted - lineage file name is now uuidv7 - It was previously based on the new target collection ID that was just added to the file, which works when adding dependencies but not when removing dependencies. UUIDv7 is time-sortable and random. - added BatchGetCollectionSoftDeleteStatus method (is called/tested in the next PR) ## Test plan _How are these changes tested?_ - [x] Tests pass locally with `pytest` for python, `yarn test` for js, `cargo test` for rust ## Documentation Changes _Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the [docs section](https://github.com/chroma-core/chroma/tree/main/docs/docs.trychroma.com)?_ n/a

Description of changes
Test plan
How are these changes tested?
pytestfor python,yarn testfor js,cargo testfor rustDocumentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?
n/a