Prevent nonce gaps for untracked transactions #646
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
RPC nonce error check
The code fixes how the TXM checks for a nonce error from the RPC by comparing the error message. The old check included an additional object that attempted to deserialize the error, but it resulted in nil, which effectively rendered the check useless.
Updated transaction confirmation logic
Transactions are no longer confirmed individually per transaction hash; instead, it uses the latest nonce to confirm transactions with a nonce lower than that (similar to EVM TXMv2). This was done for two reasons. First, the RPC responses are slow and unreliable for transaction status. Second, the old confirmation logic doesn't cover the case where a transaction is no longer tracked (either because the RPC evicted the transaction from mempool or failed internally during broadcast, but returned success).
Added transaction throttling
Unfortunately, not all RPCs will return an error when broadcasting a transaction with a nonce higher than the pending, and will keep the transaction in the mempool up to a large threshold, similar to EVM. That means, if there is a nonce gap, the TXM will be very slow to detect and resync. This is very crucial, especially after a nonce gap is created and there aren't many transactions in the queue. To fix this, the confirmation loop adds a transaction throttling mechanism for in-flight transactions via a constant value. If that threshold is exceeded, the TXM will do a resync of the local nonce. This doesn't necessarily mean that previous transactions will be overwritten, since the broadcasting loop has its own nonce fast-forwarding mechanism. You can think of this throttling mechanism as a "chance" for the broadcast loop and the TXM to go back and replay the nonce and find gaps. That doesn't mean transactions won't be overwritten, but that's a property the original TXM didn't support anyway.
Transaction overwrite
A major issue with the old design was that the TXM assumed that a successful message during broadcast means the transaction will always be either on the mempool or mined, which, as explained above, it's not the case. This has led to nonce gap incidents. To protect against that, and enable the
nonceResync
mechanism, we allow the TxStore to overwrite unconfirmed transactions with the same nonce if the nonce resyncs. On some edge cases, it means that a past transaction was replaced by a more recent one, which is already expected by the TXM, but in the majority, this means an untracked transaction was dropped for a new one that will fill the nonce gap.