pdpv0: re-enable DELETE in processPendingCleanup to stop RPC flood#1033
Merged
ZenGround0 merged 1 commit intofilecoin-project:pdpv0from Feb 19, 2026
Merged
Conversation
processPendingCleanup runs on every Filecoin block and calls PieceLive() for every piece with removed=TRUE in pdp_data_set_pieces. The DELETE that would remove confirmed-dead pieces from the table was commented out, causing the list to grow without bound and flood Lotus with EthCalls. After the b7a8796 fix (lo.Contains pointer comparison bug), pieces are now correctly marked removed=TRUE for the first time. On calibration nodes with ~2,900 such pieces, this caused ~2,900 PieceLive() EthCalls per block (~30s interval), overwhelming Lotus RPC and causing i/o timeouts on all subsequent eth_estimateGas calls. This in turn caused proving tasks to fail, and after 5 consecutive failures datasets were marked unrecoverable. Re-enable the DELETE with an AND removed=TRUE guard for safety. PieceLive() is called immediately before the DELETE and confirms the piece is gone on-chain, making this safe. The list drains to zero over a few blocks and the EthCall flood stops permanently.
ZenGround0
approved these changes
Feb 19, 2026
Collaborator
ZenGround0
left a comment
There was a problem hiding this comment.
Lets do it! Next up we'll root out the proving failures.
This was referenced Feb 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Commit #947 fixed a
lo.Containspointer comparison bug inprocessPendingPieceDeletes,which caused pieces to be correctly marked
removed=TRUEfor the first time. On calibrationnodes with ~2,900 such pieces, this had an unintended side effect:
processPendingCleanupruns on every Filecoin block (~30s) and callsverifier.PieceLive()for every piece where
removed=TRUE. The DELETE that would remove confirmed-dead pieces fromthe table was commented out (
// XXX(Kubuxu): commented out as this has lead to proving failures),so the list never shrinks.
Result: ~2,900
eth_callRPC requests to Lotus per block, continuous and permanent.Impact:
eth_estimateGascalls time out withi/o timeoutunrecoverable_proving_failure_epochMainnet unaffected (only 67
removed=TRUEpieces vs ~2,900 on calibration).Fix
Re-enable the DELETE with an
AND removed=TRUEguard.processPendingCleanupalreadycalls
PieceLive()immediately before the DELETE to confirm the piece is gone on-chain,making this safe. Once the backlog is cleared (one-time cost on first run), the function
has zero rows to process and makes zero RPC calls per block.