fix: less locking in ExportStatus#7173
Conversation
WalkthroughThis PR refactors snapshot export status tracking from mutex-protected to atomic-based state with lock-free reads, extends the RPC response type with epoch tracking fields, updates the RPC handler to consume the new atomic accessors, and adds 5-minute timeout protection around encoder async operations to prevent indefinite blocking. ChangesExport status atomic refactoring and RPC expansion
Encoder write timeout resilience
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
✨ Simplify code
Comment |
becc30f to
5b6150e
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files
... and 8 files with indirect coverage changes Continue to review full report in Codecov by Harness.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/db/car/forest.rs (1)
316-365:⚠️ Potential issue | 🟠 Major | ⚡ Quick winTimeout coverage is incomplete for
Encoder::writesink I/O.
Line 333(header write) andLine 371(footer write) are still unboundedwrite_allcalls. If the sink stalls there, export can still hang indefinitely despite the new timeout guards.Suggested patch
- sink.write_all(&header_bytes).await?; + tokio::time::timeout(ASYNC_OPS_TIMEOUT, sink.write_all(&header_bytes)) + .await + .context("`sink.write_all` (header) timed out")??; @@ - sink.write_all(&footer.to_le_bytes()).await?; + tokio::time::timeout(ASYNC_OPS_TIMEOUT, sink.write_all(&footer.to_le_bytes())) + .await + .context("`sink.write_all` (footer) timed out")??;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/db/car/forest.rs` around lines 316 - 365, Wrap the unbounded sink writes with the existing ASYNC_OPS_TIMEOUT: replace the direct await on sink.write_all(&header_bytes).await with tokio::time::timeout(ASYNC_OPS_TIMEOUT, sink.write_all(&header_bytes)).await and propagate errors with a .with_context that includes offset and n_frames; likewise locate the final footer write (the write that flushes/finishes the CAR frames — e.g. any sink.write_all or final writer.write_* into &mut sink that isn't already timeboxed) and wrap it with tokio::time::timeout(ASYNC_OPS_TIMEOUT, ...).await and add a contextual message. Ensure you reference ASYNC_OPS_TIMEOUT, sink.write_all(&header_bytes), and writer.write_zstd_skip_frames_into when making these changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/ipld/util.rs`:
- Around line 35-55: The current per-field Relaxed reads in ExportStatus produce
inconsistent combinations; add a coherent snapshot method on ExportStatus (e.g.,
pub fn snapshot(&self) -> ApiExportStatus) that returns a consistent view and
use that in the RPC instead of calling initial_epoch(), epoch(), exporting(),
cancelled(), start_time() individually. Implement snapshot via a simple
versioned-read pattern: add an AtomicU64 version counter incremented by writers
when mutating fields, and in snapshot loop read version, read the atomic fields
(Relaxed), read start_time via its RwLock, then re-read version and retry if it
changed; alternatively acquire a single lock around the snapshot if you prefer.
Update the RPC code in src/rpc/methods/chain.rs to call ExportStatus::snapshot()
to guarantee coherent ApiExportStatus responses.
---
Outside diff comments:
In `@src/db/car/forest.rs`:
- Around line 316-365: Wrap the unbounded sink writes with the existing
ASYNC_OPS_TIMEOUT: replace the direct await on
sink.write_all(&header_bytes).await with tokio::time::timeout(ASYNC_OPS_TIMEOUT,
sink.write_all(&header_bytes)).await and propagate errors with a .with_context
that includes offset and n_frames; likewise locate the final footer write (the
write that flushes/finishes the CAR frames — e.g. any sink.write_all or final
writer.write_* into &mut sink that isn't already timeboxed) and wrap it with
tokio::time::timeout(ASYNC_OPS_TIMEOUT, ...).await and add a contextual message.
Ensure you reference ASYNC_OPS_TIMEOUT, sink.write_all(&header_bytes), and
writer.write_zstd_skip_frames_into when making these changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 78d2aede-2bb6-488f-93c1-49b3c11b0d74
⛔ Files ignored due to path filters (3)
src/rpc/snapshots/forest__rpc__tests__rpc__v0.snapis excluded by!**/*.snapsrc/rpc/snapshots/forest__rpc__tests__rpc__v1.snapis excluded by!**/*.snapsrc/rpc/snapshots/forest__rpc__tests__rpc__v2.snapis excluded by!**/*.snap
📒 Files selected for processing (4)
src/db/car/forest.rssrc/ipld/util.rssrc/rpc/methods/chain.rssrc/rpc/types/mod.rs
🔗 Linked repositories identified
CodeRabbit considers these linked repositories for cross-repo context during reviews:
filecoin-project/lotus(manual)
Summary of changes
Changes introduced in this pull request:
Reference issue to close (if applicable)
Closes
Other information and links
Change checklist
Outside contributions
Summary by CodeRabbit
New Features
Bug Fixes