Skip to content

fix: less locking in ExportStatus#7173

Merged
hanabi1224 merged 1 commit into
mainfrom
hm/timeout-and-traces-for-export
Jun 13, 2026
Merged

fix: less locking in ExportStatus#7173
hanabi1224 merged 1 commit into
mainfrom
hm/timeout-and-traces-for-export

Conversation

@hanabi1224

@hanabi1224 hanabi1224 commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Summary of changes

Changes introduced in this pull request:

Reference issue to close (if applicable)

Closes

Other information and links

Change checklist

  • I have performed a self-review of my own code,
  • I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
  • I have added tests that prove my fix is effective or that my feature works (if possible),
  • I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

Outside contributions

  • I have read and agree to the CONTRIBUTING document.
  • I have read and agree to the AI Policy document. I understand that failure to comply with the guidelines will lead to rejection of the pull request.

Summary by CodeRabbit

  • New Features

    • Extended export status API to include current epoch and starting epoch information for better tracking of export progress.
  • Bug Fixes

    • Added timeout protection (5 minutes) to async write operations to prevent stuck behavior during exports.

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

This PR refactors snapshot export status tracking from mutex-protected to atomic-based state with lock-free reads, extends the RPC response type with epoch tracking fields, updates the RPC handler to consume the new atomic accessors, and adds 5-minute timeout protection around encoder async operations to prevent indefinite blocking.

Changes

Export status atomic refactoring and RPC expansion

Layer / File(s) Summary
ExportStatus atomic structure and accessors
src/ipld/util.rs
ExportStatus fields (epoch, initial_epoch, exporting, cancelled) are migrated to AtomicI64 and AtomicBool, start_time is wrapped in RwLock, and public accessor methods (epoch(), initial_epoch(), exporting(), cancelled(), start_time()) enable lock-free reads. CHAIN_EXPORT_STATUS changes from LazyLock<Mutex<ExportStatus>> to LazyLock<ExportStatus>.
Atomic state update methods
src/ipld/util.rs
update_epoch, start_export, end_export, and cancel_export are refactored to use atomic stores and compare-exchange instead of taking mutex guards. update_epoch initializes initial_epoch via compare_exchange on first write.
ApiExportStatus epoch fields
src/rpc/types/mod.rs
ApiExportStatus is extended with start_epoch and current_epoch fields to expose epoch tracking to RPC clients.
RPC handler atomic state consumption
src/rpc/methods/chain.rs
ForestChainExportStatus::handle switches to lock-free &*CHAIN_EXPORT_STATUS reads via accessors, derives progress from atomic epoch values, and populates the extended ApiExportStatus response including current_epoch, start_epoch, and status flags.

Encoder write timeout resilience

Layer / File(s) Summary
Timeout wrapping in Encoder::write
src/db/car/forest.rs
A 5-minute ASYNC_OPS_TIMEOUT constant is introduced. Frame emission loop wraps stream.try_next() and sink.write_all(&zstd_frame) in tokio::time::timeout(...) with contextual error messages including offset, n_frames, and frame length. Index-writing step (writer.write_zstd_skip_frames_into) is also wrapped with its own timeout handler.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • ChainSafe/forest#6128: Both PRs update the snapshot/chain export "status" plumbing—ExportStatus/CHAIN_EXPORT_STATUS and the ForestChainExportStatus/ApiExportStatus RPC payload—so the main PR's epoch/progress/atomic-status changes are directly tied to that PR's export-status/cancel implementation.
  • ChainSafe/forest#7160: Both PRs modify src/db/car/forest.rs around the Encoder::write/skip-frame index writing path, with the main PR adding timeout wrapping and the retrieved PR adding frame-counting/logging around the same async write steps.
  • ChainSafe/forest#6690: Both PRs modify src/db/car/forest.rs around the Encoder::write timeout and index writing path including writer.write_zstd_skip_frames_into, so the timeout/error-wrapping changes are directly related to large-index skip-frame writer changes.

Suggested labels

RPC

Suggested reviewers

  • akaladarshi
  • LesnyRumcajs
  • sudo-shashank
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: less locking in ExportStatus' directly and accurately summarizes the main change: refactoring ExportStatus to use atomic operations and lock-free patterns instead of mutex-based synchronization.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch hm/timeout-and-traces-for-export
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch hm/timeout-and-traces-for-export

Comment @coderabbitai help to get the list of available commands and usage tips.

@hanabi1224 hanabi1224 force-pushed the hm/timeout-and-traces-for-export branch from becc30f to 5b6150e Compare June 12, 2026 13:39
@hanabi1224 hanabi1224 marked this pull request as ready for review June 12, 2026 13:39
@hanabi1224 hanabi1224 requested a review from a team as a code owner June 12, 2026 13:39
@hanabi1224 hanabi1224 requested review from LesnyRumcajs, akaladarshi and sudo-shashank and removed request for a team June 12, 2026 13:39
@codecov

codecov Bot commented Jun 12, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 31.66667% with 41 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.36%. Comparing base (cbdb13d) to head (5b6150e).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/ipld/util.rs 25.71% 26 Missing ⚠️
src/rpc/methods/chain.rs 0.00% 10 Missing ⚠️
src/db/car/forest.rs 66.66% 2 Missing and 3 partials ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
src/rpc/types/mod.rs 93.87% <ø> (ø)
src/db/car/forest.rs 84.07% <66.66%> (-0.03%) ⬇️
src/rpc/methods/chain.rs 59.25% <0.00%> (-0.25%) ⬇️
src/ipld/util.rs 50.00% <25.71%> (-2.74%) ⬇️

... and 8 files with indirect coverage changes


Continue to review full report in Codecov by Harness.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cbdb13d...5b6150e. Read the comment docs.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/db/car/forest.rs (1)

316-365: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Timeout coverage is incomplete for Encoder::write sink I/O.

Line 333 (header write) and Line 371 (footer write) are still unbounded write_all calls. If the sink stalls there, export can still hang indefinitely despite the new timeout guards.

Suggested patch
-        sink.write_all(&header_bytes).await?;
+        tokio::time::timeout(ASYNC_OPS_TIMEOUT, sink.write_all(&header_bytes))
+            .await
+            .context("`sink.write_all` (header) timed out")??;
@@
-        sink.write_all(&footer.to_le_bytes()).await?;
+        tokio::time::timeout(ASYNC_OPS_TIMEOUT, sink.write_all(&footer.to_le_bytes()))
+            .await
+            .context("`sink.write_all` (footer) timed out")??;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/db/car/forest.rs` around lines 316 - 365, Wrap the unbounded sink writes
with the existing ASYNC_OPS_TIMEOUT: replace the direct await on
sink.write_all(&header_bytes).await with tokio::time::timeout(ASYNC_OPS_TIMEOUT,
sink.write_all(&header_bytes)).await and propagate errors with a .with_context
that includes offset and n_frames; likewise locate the final footer write (the
write that flushes/finishes the CAR frames — e.g. any sink.write_all or final
writer.write_* into &mut sink that isn't already timeboxed) and wrap it with
tokio::time::timeout(ASYNC_OPS_TIMEOUT, ...).await and add a contextual message.
Ensure you reference ASYNC_OPS_TIMEOUT, sink.write_all(&header_bytes), and
writer.write_zstd_skip_frames_into when making these changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/ipld/util.rs`:
- Around line 35-55: The current per-field Relaxed reads in ExportStatus produce
inconsistent combinations; add a coherent snapshot method on ExportStatus (e.g.,
pub fn snapshot(&self) -> ApiExportStatus) that returns a consistent view and
use that in the RPC instead of calling initial_epoch(), epoch(), exporting(),
cancelled(), start_time() individually. Implement snapshot via a simple
versioned-read pattern: add an AtomicU64 version counter incremented by writers
when mutating fields, and in snapshot loop read version, read the atomic fields
(Relaxed), read start_time via its RwLock, then re-read version and retry if it
changed; alternatively acquire a single lock around the snapshot if you prefer.
Update the RPC code in src/rpc/methods/chain.rs to call ExportStatus::snapshot()
to guarantee coherent ApiExportStatus responses.

---

Outside diff comments:
In `@src/db/car/forest.rs`:
- Around line 316-365: Wrap the unbounded sink writes with the existing
ASYNC_OPS_TIMEOUT: replace the direct await on
sink.write_all(&header_bytes).await with tokio::time::timeout(ASYNC_OPS_TIMEOUT,
sink.write_all(&header_bytes)).await and propagate errors with a .with_context
that includes offset and n_frames; likewise locate the final footer write (the
write that flushes/finishes the CAR frames — e.g. any sink.write_all or final
writer.write_* into &mut sink that isn't already timeboxed) and wrap it with
tokio::time::timeout(ASYNC_OPS_TIMEOUT, ...).await and add a contextual message.
Ensure you reference ASYNC_OPS_TIMEOUT, sink.write_all(&header_bytes), and
writer.write_zstd_skip_frames_into when making these changes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 78d2aede-2bb6-488f-93c1-49b3c11b0d74

📥 Commits

Reviewing files that changed from the base of the PR and between cbdb13d and 5b6150e.

⛔ Files ignored due to path filters (3)
  • src/rpc/snapshots/forest__rpc__tests__rpc__v0.snap is excluded by !**/*.snap
  • src/rpc/snapshots/forest__rpc__tests__rpc__v1.snap is excluded by !**/*.snap
  • src/rpc/snapshots/forest__rpc__tests__rpc__v2.snap is excluded by !**/*.snap
📒 Files selected for processing (4)
  • src/db/car/forest.rs
  • src/ipld/util.rs
  • src/rpc/methods/chain.rs
  • src/rpc/types/mod.rs
🔗 Linked repositories identified

CodeRabbit considers these linked repositories for cross-repo context during reviews:

  • filecoin-project/lotus (manual)

Comment thread src/ipld/util.rs
@hanabi1224 hanabi1224 enabled auto-merge June 12, 2026 15:29
@hanabi1224 hanabi1224 added this pull request to the merge queue Jun 13, 2026
Merged via the queue into main with commit d77c915 Jun 13, 2026
39 of 68 checks passed
@hanabi1224 hanabi1224 deleted the hm/timeout-and-traces-for-export branch June 13, 2026 04:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants