Skip to content

feat: Reduce PendingProofs disk usage by storing Bitcoin indices #3209

Open
addnad wants to merge 3 commits into
chainwayxyz:nightlyfrom
addnad:feat/reduce-pendingproofs-disk-usage
Open

feat: Reduce PendingProofs disk usage by storing Bitcoin indices #3209
addnad wants to merge 3 commits into
chainwayxyz:nightlyfrom
addnad:feat/reduce-pendingproofs-disk-usage

Conversation

@addnad

@addnad addnad commented Apr 8, 2026

Copy link
Copy Markdown

Implement significant disk usage optimization for pending proofs by storing
only Bitcoin block height and transaction index instead of full proof data.
Proofs are fetched from Bitcoin via RPC on retry, reducing database footprint.

Changes:

  • Create BitcoinProofLocation struct to store (block_height, tx_index)
  • Update PendingProofsOutput schema to use BitcoinProofLocation instead of Proof
  • Modify store_pending_proof() signature to accept Bitcoin indices
  • Create ProofWithLocation wrapper to propagate indices through processing
  • Update extract_zk_proofs_and_sequencer_commitments() to preserve Bitcoin indices
  • Implement fetch_proof_from_bitcoin() to re-fetch proofs from Bitcoin by location
  • Update process_pending_proofs() to fetch deferred proofs from Bitcoin via RPC
  • Add ProofFetchFailed error variant for proof retrieval failures

Benefits:

  • Significantly reduces memory/disk usage for pending proofs (full proof → 16 bytes)
  • Eliminates redundant proof storage across L1 cycles
  • Leverages Bitcoin network as proof source of truth
  • Maintains logical soundness while improving infrastructure efficiency

This addresses the core optimization suggested in issue #3168.

addnad added 2 commits April 8, 2026 05:04
Problem:
The Validate PR Title job was failing on external forks because the GITHUB_TOKEN
doesn't have the necessary permissions to post status checks on PRs from forks.

Solution:
- Added explicit condition to validate job: only run for PRs from the main repository
- Updated check-changelog job to also only run for PRs from the main repository
- Removed 'needs: validate' dependency since validate might be skipped on forks
- Added pull-requests: read permission to validate job for clarity

This allows the workflow to gracefully handle PRs from external forks while still
validating PR titles for commits to the main repository.

Fixes chainwayxyz#3178
…inwayxyz#3168)

Implement significant disk usage optimization for pending proofs by storing
only Bitcoin block height and transaction index instead of full proof data.
Proofs are fetched from Bitcoin via RPC on retry, reducing database footprint.

Changes:
- Create BitcoinProofLocation struct to store (block_height, tx_index)
- Update PendingProofsOutput schema to use BitcoinProofLocation instead of Proof
- Modify store_pending_proof() signature to accept Bitcoin indices
- Create ProofWithLocation wrapper to propagate indices through processing
- Update extract_zk_proofs_and_sequencer_commitments() to preserve Bitcoin indices
- Implement fetch_proof_from_bitcoin() to re-fetch proofs from Bitcoin by location
- Update process_pending_proofs() to fetch deferred proofs from Bitcoin via RPC
- Add ProofFetchFailed error variant for proof retrieval failures

Benefits:
- Significantly reduces memory/disk usage for pending proofs (full proof → 16 bytes)
- Eliminates redundant proof storage across L1 cycles
- Leverages Bitcoin network as proof source of truth
- Maintains logical soundness while improving infrastructure efficiency

This addresses the core optimization suggested in issue chainwayxyz#3168.
@addnad addnad requested a review from a team as a code owner April 8, 2026 05:17
Copilot AI review requested due to automatic review settings April 8, 2026 05:17
@auto-assign auto-assign Bot requested a review from ercecan April 8, 2026 05:17

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes fullnode pending-proof storage by persisting only a Bitcoin “location” (block height + tx index) and lazily re-fetching the full proof from Bitcoin RPC during retries, significantly reducing DB footprint (issue #3168).

Changes:

  • Introduces BitcoinProofLocation and updates pending-proofs storage to persist (block_height, tx_index) instead of the full Proof.
  • Propagates Bitcoin location metadata through DA extraction via a new ProofWithLocation wrapper.
  • Adds retry-time proof re-fetching (fetch_proof_from_bitcoin) and a new ProofFetchFailed error variant.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
crates/sovereign-sdk/full-node/db/sov-db/src/schema/types/mod.rs Adds BitcoinProofLocation and updates the pending-proofs output type.
crates/sovereign-sdk/full-node/db/sov-db/src/ledger_db/traits.rs Updates store_pending_proof API to accept Bitcoin indices instead of a full proof.
crates/sovereign-sdk/full-node/db/sov-db/src/ledger_db/mod.rs Stores BitcoinProofLocation into the pending-proofs table.
crates/common/src/da.rs Wraps extracted proofs with Bitcoin location metadata (ProofWithLocation).
crates/fullnode/src/da_block_handler.rs Stores locations for pending proofs and re-fetches proofs from Bitcoin on retry.
crates/fullnode/src/error.rs Adds ProofFetchFailed error variant for Bitcoin proof retrieval failures.
.github/workflows/pr-check.yml Adjusts workflow permissions and conditions for same-repo PR handling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +987 to 994
let proof_location = BitcoinProofLocation {
block_height: bitcoin_block_height,
tx_index: bitcoin_tx_index,
};
schema_batch.put::<PendingProofs>(
&(min_commitment_index, max_commitment_index),
&(proof, found_in_l1_height),
&(proof_location, found_in_l1_height),
)?;

Copilot AI Apr 8, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PendingProofs table schema still needs to be updated to store (BitcoinProofLocation, L1Height) instead of (Proof, L1Height). As-is, this put::<PendingProofs>(..., &(proof_location, found_in_l1_height)) won’t type-check against the current table definition, and even once compiled you’ll need to handle DB compatibility (e.g., drop/migrate the existing CF or add a migration that clears/rewrites old pending-proof entries).

Copilot uses AI. Check for mistakes.
Comment on lines +924 to +936

// Fetch the proof from Bitcoin using the stored location
let proof_with_location = match self.fetch_proof_from_bitcoin(proof_location, found_in_l1_height).await {
Ok(p) => p,
Err(e) => {
warn!(
"Failed to fetch proof from Bitcoin for index {min_index}-{max_index} at location block={}, tx_idx={}: {e:?}",
proof_location.block_height, proof_location.tx_index
);
break;
}
};

Copilot AI Apr 8, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On fetch_proof_from_bitcoin failure you break out of the loop but keep the (potentially permanently broken) pending-proof entry in the DB. If the failure is non-transient (e.g., tx index not found due to reorg/pruning), this can prevent all subsequent pending proofs from ever being retried/cleared. Consider distinguishing transient RPC failures vs permanent “not found” cases (e.g., remove/discard the pending entry on permanent failure, or at least continue to allow later ranges to be processed).

Suggested change
// Fetch the proof from Bitcoin using the stored location
let proof_with_location = match self.fetch_proof_from_bitcoin(proof_location, found_in_l1_height).await {
Ok(p) => p,
Err(e) => {
warn!(
"Failed to fetch proof from Bitcoin for index {min_index}-{max_index} at location block={}, tx_idx={}: {e:?}",
proof_location.block_height, proof_location.tx_index
);
break;
}
};
// Fetch the proof from Bitcoin using the stored location
let proof_with_location = match self
.fetch_proof_from_bitcoin(proof_location, found_in_l1_height)
.await
{
Ok(p) => p,
Err(e) => {
warn!(
"Failed to fetch proof from Bitcoin for index {min_index}-{max_index} at location block={}, tx_idx={}: {e:?}",
proof_location.block_height, proof_location.tx_index
);
// Keep this entry pending for a future retry, but do not let one failed
// fetch block processing of later pending proofs.
continue;
}
};

Copilot uses AI. Check for mistakes.
Comment on lines +1028 to +1032
async fn fetch_proof_from_bitcoin(
&self,
proof_location: sov_db::schema::types::BitcoinProofLocation,
l1_height: u64,
) -> Result<ProofWithLocation, ProcessingError> {

Copilot AI Apr 8, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetch_proof_from_bitcoin takes l1_height but doesn’t use it, which makes the API misleading and can trigger unused-parameter lints in some configurations. Either remove the parameter or incorporate it into the error context/logging so callers can correlate the fetch failure with the original L1 inclusion height.

Copilot uses AI. Check for mistakes.
Comment thread .github/workflows/pr-check.yml Outdated
contents: read
pull-requests: read
runs-on: ubuntu-latest
if: github.event.pull_request.head.repo.full_name == github.repository

Copilot AI Apr 8, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate job now runs under a workflow that is also triggered by issue_comment, but the job-level condition references github.event.pull_request... without guarding on github.event_name == 'pull_request'. On issue_comment events that field may be absent, which can cause expression-evaluation errors or unexpected skipping. Consider changing the condition to include an event-name check (similar to check-changelog) to keep the workflow robust across triggers.

Suggested change
if: github.event.pull_request.head.repo.full_name == github.repository
if: github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository

Copilot uses AI. Check for mistakes.
Resolve 4 issues identified in Copilot code review:

1. Update PendingProofs table schema to use BitcoinProofLocation instead of Proof
   - Modified schema/tables.rs to define PendingProofs as (u32, u32) => (BitcoinProofLocation, L1Height)
   - Added BitcoinProofLocation import to table definitions

2. Improve error handling in process_pending_proofs()
   - Change 'break' to 'continue' when proof fetch fails
   - Allows later pending proofs to be processed even if one fetch fails
   - Transient RPC failures can be retried on next cycle
   - Added l1_height to error log context for debugging

3. Remove unused parameter from fetch_proof_from_bitcoin()
   - Removed l1_height parameter that wasn't being used
   - Parameter was confusing to callers and triggered unused variable warns

4. Fix workflow condition for validate job
   - Add event type guard 'github.event_name == pull_request'
   - Prevents expression evaluation errors when triggered by issue_comment events
   - Aligns with similar condition in check-changelog job
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants