Skip to content

[SharovBot] fix(commitment): count first-nibble branches in CanDoConcurrentNext to fix flaky test#19568

Open
Giulio2002 wants to merge 1 commit intomainfrom
agent-fix/can-do-concurrent-nibble-count
Open

[SharovBot] fix(commitment): count first-nibble branches in CanDoConcurrentNext to fix flaky test#19568
Giulio2002 wants to merge 1 commit intomainfrom
agent-fix/can-do-concurrent-nibble-count

Conversation

@Giulio2002
Copy link
Collaborator

[SharovBot]

Problem

Test_Trie_CorrectSwitchForConcurrentAndSequential fails intermittently on CI:

--- FAIL: Test_Trie_CorrectSwitchForConcurrentAndSequential (0.00s)
    hex_patricia_hashed_test.go:280:
        Error: Should be true
        Messages: should be able to parallelize next run

Failing job: https://github.com/erigontech/erigon/actions/runs/22561824327/job/65349926766

Root Cause

CanDoConcurrentNext() only checks if nibble 0 has a branch node:

zeroPrefixBranch, _, err := p.root.ctx.Branch(HexNibblesToCompactBytes([]byte{0}))
if len(zeroPrefixBranch) > 4 {
    return true, nil
}

The test generates 150 keys using a globally shared rand.New(rand.NewSource(42)) protected by a mutex. Because multiple tests run in parallel they consume the random stream at different offsets each run. Occasionally, the resulting 150 keys hash (keccak256) such that zero or one keys have first nibble 0 — the nibble-0 branch is empty, so the function returns false even though the trie is perfectly wide enough to parallelize.

P(zero keys at nibble 0 with 150 random keccak hashes) ≈ (15/16)^150 ≈ 5.6×10⁻⁵ per run. Invisible in dev, but observable across thousands of CI runs.

Fix

Instead of probing a single arbitrary nibble, count how many of the 16 first-nibble sub-tries have stored branch data. Return true only if at least 4 nibbles are populated.

Scenario Nibbles filled Result
150 random keys ~16/16 true
2 remaining keys (post-deletion) ≤2/16 false

The threshold of 4 makes the decision independent of any particular nibble's occupancy and preserves the correct semantic: small key sets are processed sequentially.

Testing

  • go test ./execution/commitment/... -run Test_Trie_CorrectSwitchForConcurrentAndSequential -count=5 → 5/5 PASS
  • Full ./execution/commitment/... suite → all pass

…o fix flaky test

Test_Trie_CorrectSwitchForConcurrentAndSequential fails intermittently (CI run
22561824327, job 65349926766) because CanDoConcurrentNext() only checks if nibble-0
has a branch node. The test uses a globally-shared rand.NewSource(42) protected by
a mutex, so different parallel test runs consume the random stream at different
offsets — occasionally producing a 150-key set where zero (or one) keys land at
nibble 0 after keccak256 hashing. In that case the function returns false even
though the trie is wide enough to parallelize, causing the require.True at line 280
to fail.

Root cause: arbitrary choice of nibble 0 as the single probe. P(zero keys at nibble
0 with 150 random keccak hashes) ≈ (15/16)^150 ≈ 5.6e-5 per run — invisible in dev
but visible across thousands of CI runs.

Fix: instead of probing a single nibble, count how many of the 16 first-nibble
sub-tries have stored branch data. Require at least 4 to justify spawning 16
parallel goroutines.

- 150 random keys:  ~15.997 of 16 nibbles filled (P(< 4 filled) ≈ 0)  → true  ✓
- 2 remaining keys: at most 2 nibbles filled                            → false ✓

This makes the decision independent of any particular nibble's occupancy, eliminating
the flakiness while keeping the correct semantic (small key sets → sequential).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant