Skip to content

[wip] Commitment history regen tool#19016

Draft
sudeepdino008 wants to merge 168 commits intomainfrom
commitment_history_regen
Draft

[wip] Commitment history regen tool#19016
sudeepdino008 wants to merge 168 commits intomainfrom
commitment_history_regen

Conversation

@sudeepdino008
Copy link
Member

@sudeepdino008 sudeepdino008 commented Feb 6, 2026

todo: handle half block exec case

@sudeepdino008 sudeepdino008 force-pushed the commitment_history_regen branch 4 times, most recently from 594b302 to 018864d Compare February 6, 2026 17:36
@sudeepdino008
Copy link
Member Author

current iteration (with parallel prefetch for each block) works but is still super slow

INFO[02-10|08:57:55.138] [rebuild_commitment_history] progress    block=931/10217115 blk/s=0.5 keys=946 root=65fd94b55b37f5dbbc465e568f2e547b64484059a3b0725c4c69be7c4d5a1b6b memBatch="27.5 KB" alloc=2.2GB sys=2.5GB
INFO[02-10|08:58:19.187] [rebuild_commitment_history] progress    block=950/10217115 blk/s=0.8 keys=965 root=d38909ecd417a8355f7a7dc7f16a40727fe2ec8455320c8ef140f8f830f571b1 memBatch="27.5 KB" alloc=2.2GB sys=2.5GB
INFO[02-10|08:58:38.398] [rebuild_commitment_history] progress    block=963/10217115 blk/s=0.7 keys=978 root=ad6fbe34ab300eec89c12f79bd78744d45cefdac4bf7cc0ca88fee62dfe43d9c memBatch="27.5 KB" alloc=2.2GB sys=2.5GB
INFO[02-10|08:58:49.222] [mem] memory stats                       Rss=12.8GB Size=0B Pss=12.8GB SharedClean=2.6MB SharedDirty=0B PrivateClean=10.4GB PrivateDirty=2.5GB Referenced=12.8GB Anonymous=2.5GB Swap=0B alloc=2.2GB sys=2.5GB
INFO[02-10|08:58:55.655] [rebuild_commitment_history] progress    block=973/10217115 blk/s=0.6 keys=988 root=e72a597e68642190e06ebafe7da19bd986954fa95781841e14d6fa605e081a09 memBatch="27.5 KB" alloc=2.2GB sys=2.5GB
INFO[02-10|08:59:17.290] [rebuild_commitment_history] progress    block=985/10217115 blk/s=0.6 keys=1.00k root=c965378c4a9d4529c4307ae2f1d35ede0a8ea1cda756123b03edbaef7d780bfb memBatch="27.5 KB" alloc=2.2GB sys=2.5GB
INFO[02-10|08:59:39.166] [rebuild_commitment_history] progress    block=1001/10217115 blk/s=0.7 keys=1.01k root=876c5b4fee1d2c04b45d8f0181aa6f186460cee8a1c44909b3bc529f35b445a2 memBatch="27.5 KB" alloc=2.2GB sys=2.5GB
INFO[02-10|08:59:55.994] [rebuild_commitment_history] progress    block=1006/10217115 blk/s=0.3 keys=1.02k root=426f9cee16e451ea101c19fd41fe07450e356bc78838e4b2baedb2ce04abc1f9 memBatch="27.5 KB" alloc=2.2GB sys=2.5GB
INFO[02-10|09:00:15.173] [rebuild_commitment_history] progress    block=1019/10217115 blk/s=0.7 keys=1.03k root=6f66f3027522aea46417904c739f09a283b601ccae885a8b43ec6ef5f24b2572 memBatch="27.5 KB" alloc=2.2GB sys=2.5GB
INFO[02-10|09:00:39.710] [rebuild_commitment_history] progress    block=1033/10217115 blk/s=0.6 keys=1.04k root=f6ccc81518aeddc148ac5765d4034453182d8d9c84325ce499f926d5eaa3b6ab memBatch="27.5 KB" alloc=2.2GB sys=2.5GB
  • pprof says most time is spend on HistoryKeyRange -- which makes sense..for each block, we essentially read the entire ef file and try to collect the keys which lie in the block's txNum range.
  • what is needed: get the keys for 100 blocks (key+txnum) and then bucket the keys across those 100 blocks and then process it.
  • but HistoryRange or HistoryKeyRange -- it does de-duplication. Instead we need the duplicated values to come through..

AskAlexSharov and others added 30 commits March 4, 2026 15:00
adding keys to compressed file didn't have normal progress report
We knowing `count` in-advance. Means:
- or directly write to rebased EF
- or directly write to SimpleSeq

```
  ┌────────┬──────────────────────┬───────────────────────┐
  │        │       Builder        │         Merge         │
  ├────────┼──────────────────────┼───────────────────────┤
  │ time   │ 5.5µs → 1.8µs (−67%) │ 16.7µs → 9.0µs (−46%) │
  ├────────┼──────────────────────┼───────────────────────┤
  │ memory │ 1896B → 696B (−63%)  │ 3117B → 1175B (−63%)  │
  ├────────┼──────────────────────┼───────────────────────┤
  │ allocs │ 9 → 6 (−33%)         │ 10 → 7 (−30%)         │
  └────────┴──────────────────────┴───────────────────────┘
```
## Summary

- Add standalone MCP binary (`cmd/mcp`) with three connection modes:
JSON-RPC proxy, direct datadir access (rpcdaemon-style), and
auto-discovery
- Add `StandaloneMCPServer` that proxies MCP tool calls via JSON-RPC
HTTP (no in-process Go API dependency)
- Fix datadir mode: set `DBReadConcurrency` (was 0 → semaphore
deadlock), `TxPoolApiAddr` (was empty → gRPC error), `cfg.API` (was nil
→ empty API list)
- Fix stdio shutdown: add `ServeContext(ctx)` to propagate parent
context instead of `ServeStdio()` which creates a competing signal
handler
- Replace `MustDecode`/`MustDecodeBig` with error-returning variants in
embedded MCP handlers to prevent panics on malformed input
- Extract `Version` constant (0.0.2) shared by both server variants
- Add `README.md` with usage, Claude Desktop config, and full tool list

## Changes

| File | What |
|------|------|
| `cmd/mcp/main.go` | New standalone binary: cobra CLI, 3 connection
modes, signal handling |
| `cmd/mcp/README.md` | Documentation with examples and Claude Desktop
config |
| `rpc/mcp/standalone.go` | New `StandaloneMCPServer` — JSON-RPC proxy
variant with tools, prompts, resources |
| `rpc/mcp/mcp.go` | `MCPTransport` interface, `ServeContext()`,
`Version` const, error handling fixes |
| `Makefile` | `mcp` build target |

---------

Co-authored-by: JkLondon <me@ilyamikheev.com>
#19525)

…for tests

Add FakeMerge consensus engine that wraps Merge with a fake PoW scheme,
mirroring the existing FakeEthash pattern. Use it in ExecModuleTester
when TerminalTotalDifficultyPassed is set, and deploy Prague system
contracts (EIP-7002, EIP-7251) so FinalizeAndAssemble succeeds.

This change is intended to make sure that tests which generate chains
produce blocks which a finalized according to the forks specified in the
test config. This is not too big a deal for pre BAL manual testing but
with agents and the need to generate bals - which track system contract
touches under some circumstances - this change is more important.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary

Fixes two inaccuracies in the "Migrating from Geth" guide's port
configuration section:

- **Remove `--p2p.listen-addr`**: This flag does not exist in Erigon.
Port binding is controlled via `--port` (and optionally
`--p2p.allowed-ports` for extra ports). Replaced the text to reference
the correct flags.
- **Fix CLI syntax**: The code examples used `--port: 30304` /
`--p2p.allowed-ports: 30310, ...` (colon-space syntax, which is not
valid CLI syntax). Corrected to `--port=30304` /
`--p2p.allowed-ports=30310,...` (equals-sign, no spaces around commas).

Both fixes appear in two tab variants (standalone Erigon and Erigon +
Caplin) in the same file.

## Files changed

- `docs/gitbook/src/get-started/migrating-from-geth.md`
…#19573)

## Summary

Fixes multiple inaccuracies across the fundamentals documentation
section — stale defaults, non-existent flags, and incorrect port labels.

### Changes by file

**`fundamentals/basic-usage.md`**
- `--torrent.download.rate`: corrected default from `128mb` → `512mb`;
reworded description to reflect the flag caps bandwidth rather than
overclock it

**`fundamentals/performance-tricks.md`**
- Same download rate default fix: `128MB` → `512mb`

**`fundamentals/default-ports.md`**
- Port 30304 (sentry) labelled `eth/67` → `eth/69`

**`fundamentals/logs.md`**
- `--verbosity` default was shown as `2` (numeric level); corrected to
`info` (the actual string value used by the flag)

**`fundamentals/security.md`**
- Removed mention of `--txpool.nolocals=true` — this flag does not exist
in Erigon

**`fundamentals/multiple-instances.md`**
- Removed `--db.growth.step` from example code block — this flag does
not exist in Erigon
- Fixed env var: `ERIGON_SNAPSHOT_MADV_RND` → `SNAPSHOT_MADV_RND`
(appears twice)

**`fundamentals/modules/rpc-daemon.md`**
- Removed `--grpc` from the Erigon command in the Remote mode table —
the flag is not available on the main `erigon` binary
- Added a clarifying note explaining that the internal gRPC interface is
exposed automatically via `--private.api.addr`

**`fundamentals/modules/downloader.md`**
- `--torrent.download.rate` default: `"128mb"` → `"512mb"`
- `--torrent.upload.rate` default: `"4mb"` → `"16mb"`
- `--torrent.download.slots`: marked as DEPRECATED, default corrected
`128` → `32`

**`fundamentals/configuring-erigon/README.md`**
- `--discovery.v4` default: `true` → `false`
- Removed `--discovery.parallelism` (non-existent flag)
- Removed `--discovery.ban-threshold` (non-existent flag, also had a
markdown typo)
- `--log.dir.verbosity` default: `info` → `dbug`
- Consolidated two duplicate commitment-history flags into one combined
entry
- Added `--override.amsterdam` to General Options (narrative + help
listing)
- Added MCP Server section: `--mcp.addr`, `--mcp.port` (narrative + help
listing)
- Added FCU section: `--fcu.timeout`, `--fcu.background.prune`,
`--fcu.background.commit` (narrative + help listing)
Align pending block behavior to be as close to Geth as possible

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…nical (#19356)

APIs performing state lookups via Hash must implement a canonicality
check. Failure to do so may allow access to stale or invalid state data
from side-chains/reorgs, compromising the accuracy of the API response

API that was OK:
- eth_getStorageAt()
- eth_call ()
- eth_estimateGas()
- eth_getWitness()
- eth_getProof()
- eth_createAccessList()
- debug_traceBlockByHash ((
- debug_accountAt()

API Fixed with this PR:
- eth_getBalance()
- eth_getTransactionCount()
- eth_getCode()
- eth_getStorageAt()
- debug_storageRangeAt()
- otterscan_hasCode()
- erigon_getBalanceChangesInBlock()
- debug_traceCall()
- trace_replayBlockTransactions()
- debug_accountRange()
Idea: instead of iterating over bits, iterate over words and get
first-non-zero bit in word

Details: all Build() functions use naive O(64) bit scanning per word.
Replace with `bits.TrailingZeros64 + word &= word-1` to iterate only
over set bits directly. This compiles to a single `TZCNT` instruction
and eliminates all the skipped-zero-bit iterations.

Result: `2.75x` speedup. Also speedup scales with sparsity of upperBits
— at ~34% density, the inner loop runs ~3x fewer iterations


```
cpu: AMD EPYC 4344P 8-Core Processor
100 elements sequence: 233.0 ns/op -> 81.98 ns/op
1M elements sequence: 2344235 ns/op -> 824938 ns/op
```
## Summary

Fixes two misleading statements in
`docs/gitbook/src/interacting-with-erigon/grpc.md` about the `--grpc`
flag:

- **Introduction paragraph**: The original text said the gRPC server
"must be explicitly enabled using `--grpc`" when starting the RPC
daemon, without clarifying that this flag only applies to the standalone
`rpcdaemon` binary. The main `erigon` binary does *not* accept `--grpc`;
its internal gRPC services (txpool, downloader, sentry, etc.) start
automatically and are accessed via `--private.api.addr`. Reworded to
make this distinction clear.

- **Availability section**: The bullet point "gRPC services are
available when enabled with the `--grpc` flag" now explicitly notes it
is the **standalone `rpcdaemon`** binary that accepts this flag.

## Files changed

- `docs/gitbook/src/interacting-with-erigon/grpc.md`
## Summary

Adds a new **MCP Server** page to the gitbook documentation under the
Fundamentals section (after Modules), documenting Erigon's built-in
Model Context Protocol server.

## What is covered

- **Introduction to MCP**: what the protocol is, why it matters, that it
is read-only
- **Two server variants**: embedded (inside erigon via
`--mcp.addr`/`--mcp.port`) vs standalone (`cmd/mcp` binary)
- **Configuration tables**: flags for both the embedded and standalone
modes
- **Three connection modes** (standalone): JSON-RPC proxy, direct
datadir, auto-discovery
- **Use cases with examples**:
- Interactive blockchain analysis (querying balances, logs, block data
via natural language)
  - Node debugging and monitoring (log triage, sync status checks)
- **Claude Desktop setup**: tabbed examples for JSON-RPC mode, datadir
mode, and embedded SSE mode
- **Tools reference**: grouped listing of all 40+ tools (`eth_*`,
`erigon_*`, `ots_*`, `logs_*`, `metrics_*`)
- **Resources and Prompts tables**: all `erigon://` resource URIs and
named prompt templates

## Files changed

- `docs/gitbook/src/fundamentals/mcp.md` — new page
- `docs/gitbook/src/SUMMARY.md` — added `* [MCP
Server](fundamentals/mcp.md)` after the Modules entry
…atency (#19583)

closes #18770

Co-authored-by: JkLondon <me@ilyamikheev.com>
…acking tests (#19565)

This PR allows us to use different DB versions between main branch and
release branch to avoid any incompatibility problems between databases

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This test requires that, when starting Erigon from a synced DB, it reach
the tip within a few minutes.
This does not seem to be true. So, temporarily, we change the test
trigger:
- run nightly in main, 
- run on every push in release
…resolution out of Open() (#19587)

## Summary

- Move `ResolveErigonDBSettings` out of `Aggregator.Open()` so callers
resolve settings explicitly before constructing the aggregator, enabling
the downloader to provide the real `erigondb.toml` before domains are
configured.
- Whitelist `erigondb.toml` in the header-chain download phase so the
torrent downloader delivers it alongside headers/bodies.
- After the header-chain download completes, `ReloadErigonDBSettings()`
re-reads the file and propagates any stepSize changes to all
Domain/InvertedIndex instances.
- Add `WithErigonDBSettings()` builder on `AggOpts` so all call sites
pass pre-resolved settings.

Three runtime scenarios handled:
1. **Legacy datadir** (has `preverified.toml`, no `erigondb.toml`):
writes `erigondb.toml` with legacy settings (step_size=1,562,500) on
startup.
2. **Fresh sync with downloader**: starts with defaults, downloader
delivers real `erigondb.toml` during header-chain phase, settings are
reloaded and propagated.
3. **Fresh sync with `--no-downloader`**: writes defaults to disk
immediately since no downloader will provide the file.

## Test plan

- [x] Scenario 1 (legacy datadir): confirmed log `Creating erigondb.toml
with LEGACY settings step_size=1562500` and file written on startup
- [x] Scenario 2 (fresh + downloader): confirmed log `erigondb stepSize
changed, propagating` after header-chain download delivers
`erigondb.toml` with step_size=390625
- [x] Scenario 3 (fresh + `--no-downloader`): confirmed log
`Initializing erigondb.toml with DEFAULT settings (nodownloader)` and
file written immediately

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Changes

- Move `GOMODCACHE`/`GOCACHE` env vars to job level so all steps share
the same cache paths
- Add **pre-run** `go clean -testcache` (`if: always()`) — ensures tests
run fresh and catches leftovers from cancelled runs
- Add **LFS prune** step after checkout — removes blobs no longer
referenced by the current branch
- Replace old `Cleanup` with **post-run** `go clean -cache` — clears
compiled artifacts after each run to prevent unbounded growth on the
persistent runner
- Post-run cleanup also removes `coverage-test-all.out` and
`$env:RUNNER_TEMP` (previously only `$env:TEMP` was cleaned)
…19590)

**[SharovBot]** Fix DATA RACE in `db/state/aggregator.go`

## Summary
- `buildFiles()` calls `BeginFilesRo()` on `Domain` and `InvertedIndex`
without holding `visibleFilesLock`, racing with `recalcVisibleFiles()`
which writes `_visible`/`_visibleFiles` fields under the same lock
- Wraps the `BeginFilesRo()` calls in `buildFiles()` with
`a.visibleFilesLock.RLock()`/`RUnlock()` to synchronize with the writer,
matching the pattern already used in `Aggregator.BeginFilesRo()`

## Test plan
- [x] `go build ./...` passes
- [x] `go test -race ./execution/verify/... -run
TestHistoryVerification_WithUserTransactions` passes 3 consecutive times
with no DATA RACE
- [x] `go test -race ./db/state/...` passes with no DATA RACE

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
… is unspecified (#19585)

## Summary
- Fix ENR having no IP address when `caplin.discovery.addr` is not
explicitly set (defaults to `0.0.0.0`), making the node undiscoverable
via DISCV5
- Auto-detect the outbound IP from the OS routing table when the
configured discovery address is unspecified (`0.0.0.0` or `::`)
- Add nil-guard in `Identity()` to prevent panic when ENR has no IP
(defensive check)

## Test plan
- [x] Tested on Chiado network — node reaches chain tip and produces
blocks
- [x] Without `--caplin.discovery.addr`: outbound IP auto-detected, ENR
and `discovery_addresses` populated correctly
- [x] With explicit `--caplin.discovery.addr`: works as before, no
warning logged
- [x] `make lint && make erigon` pass clean

Fixes #19576

Generated with Claude Code

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Problem: 
<img width="1437" height="1233" alt="Screenshot 2026-03-02 at 17 20 26"
src="https://github.com/user-attachments/assets/29993bd6-2a23-4c49-946c-62b390014818"
/>


In PR:
- re-use 1 Builder object for many EF merges and many EF builds (in
II.mergeFiles)
- moved `Merge` method from `Reader` to `Builder`
- moved reusable iterators for `Merge` method inside `Builder` object
- “alloc=2.7g sys=5.4g” -> “alloc=2.7g sys=2.7g” (during large .rf file
merge)

---------

Co-authored-by: info@weblogix.biz <admin@10gbps.weblogix.it>
old `aux` buffer often did re-alloc (16children * 32hashlen = 368 > 256)
```
  ┌───────────────────┬────────┬────────┬───────────┐
  │      Pattern      │ ns/op  │  B/op  │ allocs/op │
  ├───────────────────┼────────┼────────┼───────────┤
  │ fresh-make (old)  │ 315 ns │ 1408 B │ 2 allocs  │
  ├───────────────────┼────────┼────────┼───────────┤
  │ reuse-clone (new) │ 248 ns │ 704 B  │ 1 alloc   │
  └───────────────────┴────────┴────────┴───────────┘
```
…ods (#19533)

Two separate BlockOverrides structs existed in the codebase
(ethapi.BlockOverrides and transactions.BlockOverrides) with overlapping
but divergent fields, different field types, and different JSON tags.
This caused inconsistencies across eth_call, eth_estimateGas,
eth_simulateV1, and eth_callMany.

Changes:
- Merge into a single ethapi.BlockOverrides with all 11 fields: Number,
Difficulty, Time, GasLimit, FeeRecipient, PrevRandao, BaseFeePerGas,
BlobBaseFee, BeaconRoot, BlockHash, Withdrawals
- Add ethapi.BlockHashOverrides (was only in transactions package)
- Three methods with clearly separated responsibilities:
- Override(*BlockContext) error — for eth_call/eth_estimateGas; rejects
BeaconRoot and Withdrawals (aligned with Geth); now propagates error
(was silently swallowed before)
  - OverrideHeader(*Header) *Header — for eth_simulateV1 block assembly
- OverrideBlockContext(*BlockContext, BlockHashOverrides) — for
eth_simulateV1/eth_callMany; applies all fields including BlobBaseFee
and BlockHash
- Remove duplicate struct from transactions/call.go
- Replace manual field-by-field application in eth_callMany with
OverrideBlockContext call
- Add complete unit test suite: 20 tests covering all three methods, nil
receivers, field isolation, overflow checks, rejection of unsupported
fields, and BlockHash map merging

Verify type consistency and functional behavioral  with Geth.

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
closes #19054

tried to oneshot random issue with my new reasoning bot Kimiko) Seems
good wdyt @canepat

---------

Co-authored-by: JkLondon <me@ilyamikheev.com>
The fix iterates over requestedChecks (not the skip list) and excludes
   anything in the skip set. Before, it was iterating over skip entries
  and only keeping ones not found in the requested list — the exact     
  opposite of what's needed.
The PR reinstates the BlockAssembler and the move of BAL processing to
the versionedio code rather than as an auxiliary set of functions in the
sync process.

The main advantage of this change is that it consolidates behaviour
which makes agent understanding more focussed and makes the bal code
re-usable in tests. This makes the codebase more streamlined avoids
breaks and confusion in test analysis.

It contains fixes merged by the claude agent from
#19434 and
#19525, along with fixes that
get to a running bal-devnet-2 node.

This is ready for merging.

---------

Co-authored-by: taratorio <94537774+taratorio@users.noreply.github.com>
Co-authored-by: yperbasis <andrey.ashikhmin@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Andrew Ashikhmin <34320705+yperbasis@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.