[Mirror] [ENH](frontend): block functions on topology dbs (upstream #6836)#5
[Mirror] [ENH](frontend): block functions on topology dbs (upstream #6836)#5hashbender wants to merge 23 commits intomainfrom
Conversation
Add backend overrides for manifest bounds and bounded fragment scans. Use the Spanner-backed manifest manager to: - derive manifest bounds without loading the full manifest - scan fragments from the compaction offset with a position_limit index - let log-service scout and pull logs through the partial-read path - expose a phase-0 garbage reset helper for GC recovery Essentially, what we can do now because of this patch is do a bounded manifest fetch. As spanner manifests grow in size, the cost of fetching the manifest begins to dominate. This adds a proportional-to-the-fragments fetch path. Co-authored-by: AI
Introduce dedicated multi-cluster multi-region (MCMR) property tests that write records across two regions and verify cross-region replication invariants. Parameterize existing test_add tests by database name so they run against both classic and MCMR databases when multi-region is enabled. Fix dirty log detection in the repl manifest manager: - Treat intrinsic_cursor=0 as unset so newly initialized logs are not spuriously reported as dirty - Initialize manifest_regions rows with intrinsic_cursor=0 instead of omitting the column, preventing GC from misinterpreting the starting offset as an active reader position - Change the SQL threshold comparison from > to >= so logs at the exact threshold boundary are correctly included - Add decode_intrinsic_cursor helper that filters out zero values Retry Spanner-aborted flush compaction transactions: - Add is_spanner_aborted() to SysDbError to detect gRPC Aborted status - Wrap the flush_collection_compaction read-write transaction with backon retry (100ms delay, 4 attempts) on Spanner abort - Propagate gRPC status codes through SysDbError and FlushCompactionError instead of mapping all Spanner errors to Internal Improve dirty log test diagnostics by dumping in-memory and Spanner state on assertion failure. Add regression tests for single-add and three-add dirty log scenarios. Co-authored-by: AI
This reverts commit 0eebd27.
Reject attached function operations when the database name carries a topology prefix, reusing the same multi-region validation path as collection forking. Add targeted frontend endpoint tests covering attach, get, and detach against topology-prefixed databases. Co-authored-by: AI
|
Tenki Code Review - Complete Files Reviewed: 25 Reviewed the PR changes across Rust log/sysdb/frontend paths, Python property tests, and CI/workflow updates. No reachable code quality or security issues meeting the reporting threshold were identified. Files Reviewed (25 files) |
|
Tenki Code Review - Complete Files Reviewed: 25 By Severity:
I found one medium-confidence functional regression in log scouting for multi-region databases. The rest of the reviewed production changes look directionally sound, including retry/error-code propagation improvements. Files Reviewed (25 files) |
There was a problem hiding this comment.
Overall
One actionable issue was identified in production Rust code. Most other changes (retry handling, error code propagation, and MCMR read/scan plumbing) appear consistent with intended behavior.
Key Finding
- Missing-resource semantics regression in
rust/log-service/src/lib.rs: inscout_logs, the topology path treatsmanifest_bounds_and_witness() == Noneas an empty log ((1,1)) instead of returningNotFound. This can cause nonexistent collections to be reported as successful empty logs.
Security Review Notes
- Reviewed request-handling and storage-facing paths in
rust/frontend/src/server.rs,rust/log-service/src/lib.rs,rust/rust-sysdb/src/spanner.rs, andrust/sysdb/src/sysdb.rs. - No high-confidence injection, auth bypass, or secret-handling vulnerabilities were found in changed code.
| bounds_and_witness.bounds.oldest_timestamp, | ||
| bounds_and_witness.bounds.next_write_timestamp, | ||
| ), | ||
| None => (LogPosition::from_offset(1), LogPosition::from_offset(1)), |
There was a problem hiding this comment.
🟠 scout_logs hides missing multi-region collections as empty logs (bug)
When database_name includes a topology, scout_logs uses manifest_bounds_and_witness() and treats None as an empty log (offset 1,1). For the repl manifest manager, None means no manifest row exists for that collection/region, so invalid or missing collections are reported as successful empty logs instead of NotFound. This regresses behavior from the non-topology branch, which maps uninitialized/missing logs to a not-found error.
💡 Suggestion: Return Status::not_found when manifest_bounds_and_witness() is None in the topology branch, matching the non-topology semantics for missing collections.
| None => (LogPosition::from_offset(1), LogPosition::from_offset(1)), | |
| None => return Err(Status::not_found(format!("collection {collection_id} not found"))), |
📋 Prompt for AI Agents
In rust/log-service/src/lib.rs around line 2174 inside scout_logs, change the topology_name.is_some() match arm so that None from log_reader.manifest_bounds_and_witness() returns Status::not_found(format!("collection {collection_id} not found")) instead of defaulting to offsets (1,1). This preserves missing-collection semantics and avoids falsely reporting nonexistent logs as empty.
Mirrored from chroma-core#6836 — Frontend validation for multi-region + integration tests