[SharovBot] fix(commitment): count first-nibble branches in CanDoConcurrentNext to fix flaky test by Giulio2002 · Pull Request #19568 · erigontech/erigon

Giulio2002 · 2026-03-02T11:28:26Z

[SharovBot]

Problem

Test_Trie_CorrectSwitchForConcurrentAndSequential fails intermittently on CI:

--- FAIL: Test_Trie_CorrectSwitchForConcurrentAndSequential (0.00s)
    hex_patricia_hashed_test.go:280:
        Error: Should be true
        Messages: should be able to parallelize next run

Failing job: https://github.com/erigontech/erigon/actions/runs/22561824327/job/65349926766

Root Cause

CanDoConcurrentNext() only checks if nibble 0 has a branch node:

zeroPrefixBranch, _, err := p.root.ctx.Branch(HexNibblesToCompactBytes([]byte{0}))
if len(zeroPrefixBranch) > 4 {
    return true, nil
}

The test generates 150 keys using a globally shared rand.New(rand.NewSource(42)) protected by a mutex. Because multiple tests run in parallel they consume the random stream at different offsets each run. Occasionally, the resulting 150 keys hash (keccak256) such that zero or one keys have first nibble 0 — the nibble-0 branch is empty, so the function returns false even though the trie is perfectly wide enough to parallelize.

P(zero keys at nibble 0 with 150 random keccak hashes) ≈ (15/16)^150 ≈ 5.6×10⁻⁵ per run. Invisible in dev, but observable across thousands of CI runs.

Fix

Instead of probing a single arbitrary nibble, count how many of the 16 first-nibble sub-tries have stored branch data. Return true only if at least 4 nibbles are populated.

Scenario	Nibbles filled	Result
150 random keys	~16/16	`true` ✓
2 remaining keys (post-deletion)	≤2/16	`false` ✓

The threshold of 4 makes the decision independent of any particular nibble's occupancy and preserves the correct semantic: small key sets are processed sequentially.

Testing

go test ./execution/commitment/... -run Test_Trie_CorrectSwitchForConcurrentAndSequential -count=5 → 5/5 PASS
Full ./execution/commitment/... suite → all pass

…o fix flaky test Test_Trie_CorrectSwitchForConcurrentAndSequential fails intermittently (CI run 22561824327, job 65349926766) because CanDoConcurrentNext() only checks if nibble-0 has a branch node. The test uses a globally-shared rand.NewSource(42) protected by a mutex, so different parallel test runs consume the random stream at different offsets — occasionally producing a 150-key set where zero (or one) keys land at nibble 0 after keccak256 hashing. In that case the function returns false even though the trie is wide enough to parallelize, causing the require.True at line 280 to fail. Root cause: arbitrary choice of nibble 0 as the single probe. P(zero keys at nibble 0 with 150 random keccak hashes) ≈ (15/16)^150 ≈ 5.6e-5 per run — invisible in dev but visible across thousands of CI runs. Fix: instead of probing a single nibble, count how many of the 16 first-nibble sub-tries have stored branch data. Require at least 4 to justify spawning 16 parallel goroutines. - 150 random keys: ~15.997 of 16 nibbles filled (P(< 4 filled) ≈ 0) → true ✓ - 2 remaining keys: at most 2 nibbles filled → false ✓ This makes the decision independent of any particular nibble's occupancy, eliminating the flakiness while keeping the correct semantic (small key sets → sequential).

Giulio2002 requested review from antonis19, awskii and taratorio as code owners March 2, 2026 11:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SharovBot] fix(commitment): count first-nibble branches in CanDoConcurrentNext to fix flaky test#19568

[SharovBot] fix(commitment): count first-nibble branches in CanDoConcurrentNext to fix flaky test#19568
Giulio2002 wants to merge 1 commit intomainfrom
agent-fix/can-do-concurrent-nibble-count

Giulio2002 commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Giulio2002 commented Mar 2, 2026

Problem

Root Cause

Fix

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant