Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
|
Improve This PR updates rebuild observability in The change is narrowly scoped to logging behavior in the rebuild path, with no API or schema changes. However, the new result-to-collection association currently relies on This summary was automatically generated by @propel-code-bot |
There was a problem hiding this comment.
Changes suggested due to an important logic flaw where async result ordering causes incorrect collection-to-result logging associations.
Status: Changes Suggested | Risk: Medium
Issues Identified & Suggestions
- Async result-to-collection pairing is unreliable; avoid positional
zipmapping:rust/worker/src/compactor/compaction_manager.rs filter_mapandFuturesUnorderedcan reorder/misalign results; carry collection ID through futures:rust/worker/src/compactor/compaction_manager.rs
Review Details
📁 1 files reviewed | 💬 1 comments
👍 / 👎 individual comments to help improve reviews for you
| .await; | ||
|
|
||
| // Log results | ||
| for (result, collection) in results.into_iter().zip(collections.iter()) { |
There was a problem hiding this comment.
[Logic] There are actually two bugs causing the zip pairing between results and collections to be incorrect:
-
filter_mapmisalignment: If any collection has an invalidDatabaseName, it's filtered out, makingresultsshorter thancollections. The Nth result will then be logged as belonging to the wrong collection. -
FuturesUnorderedordering (even more critical):FuturesUnorderedyields results in arbitrary completion order — whichever future finishes first, not insertion order. So even if every collection has a valid database name,resultswill be in a random order that doesn't matchcollections, andzipwill mis-pair them virtually every time there are 2+ concurrent rebuilds.
The fix is to carry the collection reference through the future chain so each result is self-describing, rather than relying on positional alignment:
let results = collections
.iter()
.filter_map(|collection| {
match chroma_types::DatabaseName::new(collection.database.clone()) {
Some(database_name) => Some((
collection,
self.context.clone().compact(
collection.collection_id,
database_name,
collection.tenant.clone(),
true,
segment_scopes.clone(),
),
)),
None => {
tracing::error!(
"Invalid database name '{}' for collection {} (must be at least 3 characters)",
collection.database,
collection.collection_id
);
None
}
}
})
.map(|(collection, fut)| async move { (collection.collection_id, fut.await) })
.collect::<FuturesUnordered<_>>()
.collect::<Vec<_>>()
.await;Then log using the collection_id embedded in each tuple, removing the zip entirely.
Context for Agents
There are actually **two** bugs causing the `zip` pairing between `results` and `collections` to be incorrect:
1. **`filter_map` misalignment**: If any collection has an invalid `DatabaseName`, it's filtered out, making `results` shorter than `collections`. The Nth result will then be logged as belonging to the wrong collection.
2. **`FuturesUnordered` ordering** (even more critical): `FuturesUnordered` yields results in arbitrary completion order — whichever future finishes first, not insertion order. So even if every collection has a valid database name, `results` will be in a random order that doesn't match `collections`, and `zip` will mis-pair them virtually every time there are 2+ concurrent rebuilds.
The fix is to carry the collection reference through the future chain so each result is self-describing, rather than relying on positional alignment:
```rust
let results = collections
.iter()
.filter_map(|collection| {
match chroma_types::DatabaseName::new(collection.database.clone()) {
Some(database_name) => Some((
collection,
self.context.clone().compact(
collection.collection_id,
database_name,
collection.tenant.clone(),
true,
segment_scopes.clone(),
),
)),
None => {
tracing::error!(
"Invalid database name '{}' for collection {} (must be at least 3 characters)",
collection.database,
collection.collection_id
);
None
}
}
})
.map(|(collection, fut)| async move { (collection.collection_id, fut.await) })
.collect::<FuturesUnordered<_>>()
.collect::<Vec<_>>()
.await;
```
Then log using the `collection_id` embedded in each tuple, removing the `zip` entirely.
File: rust/worker/src/compactor/compaction_manager.rs
Line: 3049fb5102 to
bdc9fa1
Compare
46404f7 to
9372b86
Compare
9372b86 to
8ee38ba
Compare

Description of changes
Edited the rebuild flow to explicitly log upon failure.
Test plan
Local testing.
pytestfor python,yarn testfor js,cargo testfor rustMigration plan
Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?
Observability plan
What is the plan to instrument and monitor this change?
Documentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?