Skip to content

Commit ad1eaf4

Browse files
authored
Merge branch 'main' into feat/persistent-shared-domains
2 parents 161ac62 + d7c9ea8 commit ad1eaf4

119 files changed

Lines changed: 2576 additions & 2609 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.claude/skills/launch-bal-devnet-2/SKILL.md

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -173,6 +173,104 @@ bash $WORKDIR/clean.sh
173173

174174
This runs `stop.sh`, removes erigon chain data (chaindata, snapshots, txpool, nodes, temp) and lighthouse data, then re-initializes genesis. After clean, start again with Steps 4-5.
175175

176+
## A/B Testing: BAL vs Non-BAL Parallel Execution
177+
178+
Compare parallel execution throughput with and without BAL scheduling optimization.
179+
180+
### Overview
181+
182+
- **Instance A (BAL)**: Default `bal-devnet-2` instance at `$WORKDIR` — uses BAL to pre-populate version maps and schedule transactions optimistically
183+
- **Instance B (No-BAL)**: Second instance at `$WORKDIR-nobal` — sets `IGNORE_BAL=true` to force dependency-tracking scheduling path
184+
185+
Both instances sync the same chain, enabling direct throughput comparison.
186+
187+
### Instance B Port Assignments (offset +400)
188+
189+
| Service | Port | Protocol |
190+
|---------|------|----------|
191+
| Erigon HTTP RPC | 8945 | TCP |
192+
| Erigon Engine API (authrpc) | 8951 | TCP |
193+
| Erigon WebSocket | 8946 | TCP |
194+
| Erigon P2P | 30703 | TCP+UDP |
195+
| Erigon gRPC | 9490 | TCP |
196+
| Erigon Torrent | 42469 | TCP+UDP |
197+
| Erigon pprof | 6460 | TCP |
198+
| Erigon metrics | 6461 | TCP |
199+
| Lighthouse P2P | 9100 | TCP+UDP |
200+
| Lighthouse QUIC | 9101 | UDP |
201+
| Lighthouse HTTP API | 5352 | TCP |
202+
| Lighthouse metrics | 5464 | TCP |
203+
204+
### Setup Instance B
205+
206+
1. Create working directory and copy config:
207+
```bash
208+
NOBAL_DIR=${WORKDIR}-nobal
209+
mkdir -p $NOBAL_DIR/erigon-data $NOBAL_DIR/lighthouse-data
210+
cp -r $WORKDIR/testnet-config $NOBAL_DIR/
211+
cp $WORKDIR/genesis.json $NOBAL_DIR/
212+
```
213+
214+
2. Build a binary with `IGNORE_BAL` support (requires bal-devnet-2 branch with commit `45625d09`+):
215+
```bash
216+
# Build from bal-devnet-2 branch (or use existing binary if already up to date)
217+
make erigon
218+
```
219+
220+
3. Initialize genesis:
221+
```bash
222+
./build/bin/erigon init --datadir=$NOBAL_DIR/erigon-data $NOBAL_DIR/genesis.json
223+
```
224+
225+
4. Create start scripts — same as Instance A but with:
226+
- `export IGNORE_BAL=true` in `start-erigon.sh`
227+
- Port offsets from the table above
228+
- Docker container name: `bal-devnet-2-nobal-lighthouse`
229+
- `--execution-endpoint=http://127.0.0.1:8951` in Lighthouse
230+
- `--disable-enr-auto-update` in Lighthouse (second instance on same host)
231+
232+
5. Start Instance B (erigon first, then Lighthouse):
233+
```bash
234+
cd $NOBAL_DIR && nohup bash start-erigon.sh > erigon-console.log 2>&1 &
235+
# Wait for jwt.hex
236+
cd $NOBAL_DIR && nohup bash start-lighthouse.sh > lighthouse-console.log 2>&1 &
237+
```
238+
239+
### Metrics to Compare
240+
241+
Once both instances reach chain tip, compare at-head execution metrics:
242+
243+
| Metric | Source | What it measures |
244+
|--------|--------|------------------|
245+
| `gas/s` | Execution log lines | Raw execution throughput |
246+
| `repeat%` | Execution log lines | Speculative re-execution rate (lower = better dependency prediction) |
247+
| `abort` | Execution log lines | Number of aborted transactions per batch |
248+
| `invalid` | Execution log lines | Transactions invalidated by conflict detection |
249+
| `blk/s` | Execution log lines | Block processing rate |
250+
251+
**Expected**: BAL instance should have lower `repeat%` and `abort` counts because BAL pre-populates the version map, reducing false conflicts. The `gas/s` difference shows the net throughput impact.
252+
253+
### Monitoring Script
254+
255+
```bash
256+
# Side-by-side log comparison
257+
echo "=== BAL (Instance A) ===" && \
258+
grep -E "parallel (executed|done)" $WORKDIR/erigon-data/logs/erigon.log | tail -3 && \
259+
echo "" && \
260+
echo "=== No-BAL (Instance B) ===" && \
261+
grep -E "parallel (executed|done)" ${WORKDIR}-nobal/erigon-data/logs/erigon.log | tail -3
262+
```
263+
264+
### Cleanup
265+
266+
```bash
267+
# Stop Instance B
268+
docker stop bal-devnet-2-nobal-lighthouse 2>/dev/null
269+
pkill -f "datadir.*bal-devnet-2-nobal/erigon-data"
270+
# Optionally remove data
271+
rm -rf ${WORKDIR}-nobal
272+
```
273+
176274
## Troubleshooting
177275

178276
| Problem | Solution |

.github/workflows/scripts/run_rpc_tests_ethereum.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,4 +41,4 @@ DISABLED_TEST_LIST=(
4141
DISABLED_TESTS=$(IFS=,; echo "${DISABLED_TEST_LIST[*]}")
4242

4343
# Call the main test runner script with the required and optional parameters
44-
"$(dirname "$0")/run_rpc_tests.sh" mainnet v1.122.0 "$DISABLED_TESTS" "$WORKSPACE" "$RESULT_DIR"
44+
"$(dirname "$0")/run_rpc_tests.sh" mainnet v1.123.0 "$DISABLED_TESTS" "$WORKSPACE" "$RESULT_DIR"

.github/workflows/scripts/run_rpc_tests_ethereum_latest.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,4 +34,4 @@ DISABLED_TEST_LIST=(
3434
DISABLED_TESTS=$(IFS=,; echo "${DISABLED_TEST_LIST[*]}")
3535

3636
# Call the main test runner script with the required and optional parameters
37-
"$(dirname "$0")/run_rpc_tests.sh" mainnet v1.122.0 "$DISABLED_TESTS" "$WORKSPACE" "$RESULT_DIR" "latest" "$REFERENCE_HOST" "do-not-compare-error-message" "$DUMP_RESPONSE"
37+
"$(dirname "$0")/run_rpc_tests.sh" mainnet v1.123.0 "$DISABLED_TESTS" "$WORKSPACE" "$RESULT_DIR" "latest" "$REFERENCE_HOST" "do-not-compare-error-message" "$DUMP_RESPONSE"

.github/workflows/scripts/run_rpc_tests_gnosis.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,5 +22,5 @@ DISABLED_TEST_LIST=(
2222
DISABLED_TESTS=$(IFS=,; echo "${DISABLED_TEST_LIST[*]}")
2323

2424
# Call the main test runner script with the required and optional parameters
25-
"$(dirname "$0")/run_rpc_tests.sh" gnosis v1.122.0 "$DISABLED_TESTS" "$WORKSPACE" "$RESULT_DIR"
25+
"$(dirname "$0")/run_rpc_tests.sh" gnosis v1.123.0 "$DISABLED_TESTS" "$WORKSPACE" "$RESULT_DIR"
2626

.github/workflows/scripts/run_rpc_tests_remote_ethereum.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,4 +28,4 @@ DISABLED_TEST_LIST=(
2828
DISABLED_TESTS=$(IFS=,; echo "${DISABLED_TEST_LIST[*]}")
2929

3030
# Call the main test runner script with the required and optional parameters
31-
"$(dirname "$0")/run_rpc_tests.sh" mainnet v1.122.0 "$DISABLED_TESTS" "$WORKSPACE" "$RESULT_DIR"
31+
"$(dirname "$0")/run_rpc_tests.sh" mainnet v1.123.0 "$DISABLED_TESTS" "$WORKSPACE" "$RESULT_DIR"

.github/workflows/test-all-erigon.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,9 @@ jobs:
4646
# marginal timeouts while keeping Linux/macOS at the tighter 20m.
4747
- os: windows-2025
4848
test_timeout: 30m
49+
skip_execution_tests: 'true'
50+
- os: macos-15
51+
skip_execution_tests: 'true'
4952
runs-on: ${{ matrix.os }}
5053

5154
steps:
@@ -71,6 +74,7 @@ jobs:
7174
env:
7275
SKIP_FLAKY_TESTS: 'true'
7376
TEST_TIMEOUT: ${{ matrix.test_timeout || '20m' }}
77+
ERIGON_SKIP_EXECUTION_TESTS: ${{ matrix.skip_execution_tests || '' }}
7478
run: |
7579
if ! make test-all GO_FLAGS="--timeout $TEST_TIMEOUT"; then
7680
# Retry after clearing the Go build cache. Low disk space can cause

.github/workflows/test-hive-eest.yml

Lines changed: 23 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -19,14 +19,31 @@ jobs:
1919
test-hive-eest:
2020
runs-on:
2121
group: hive
22+
name: test-hive-eest (${{ matrix.shard }})
2223
strategy:
2324
fail-fast: false
2425
matrix:
2526
include:
27+
# consume-engine: split by fork for parallelism (61k+ tests total).
28+
# Engine API tests only exist from Paris (The Merge) onwards.
29+
# Pattern format: "<suite-regex>/<test-regex>" — the leading ".*/""
30+
# matches the suite name (eest/consume-engine) while the remainder
31+
# filters individual pytest test IDs by fork parameter.
2632
- sim: consume-engine
27-
sim-limit: ""
33+
sim-limit: ".*/.*fork_(Paris|Shanghai)"
34+
shard: paris+shanghai
35+
- sim: consume-engine
36+
sim-limit: ".*/.*fork_Cancun"
37+
shard: cancun
38+
- sim: consume-engine
39+
sim-limit: ".*/.*fork_Prague"
40+
shard: prague
41+
- sim: consume-engine
42+
sim-limit: ".*/.*fork_Osaka"
43+
shard: osaka
2844
- sim: consume-rlp
2945
sim-limit: ".*eip2930_access_list.*" # all tests take too long
46+
shard: rlp
3047
steps:
3148
- name: Clean docker system
3249
run: |
@@ -120,28 +137,28 @@ jobs:
120137
failed=$(echo "$status_line" | sed -n 's/.*failed=\([0-9]*\).*/\1/p')
121138
122139
echo -e "\n"
123-
message="----------- Results for ${1} -----------\nTests: $tests, Failed: $failed\n\n============================================================"
140+
message="----------- Results for ${1} (${{ matrix.shard }}) -----------\nTests: $tests, Failed: $failed\n\n============================================================"
124141
echo -e "$message"
125142
echo -e "$message" >> result.log
126143
127144
if (( tests < 4 )); then
128-
echo "Too few tests run for suite ${1} - ${tests} tests"
145+
echo "Too few tests run for ${{ matrix.shard }} - ${tests} tests"
129146
echo "failed" > failed.log
130147
exit 1
131148
fi
132149
if (( failed > 0 )); then
133-
echo "Too many failures for suite ${1} - ${failed} failed out of ${tests}"
150+
echo "Too many failures for ${{ matrix.shard }} - ${failed} failed out of ${tests}"
134151
echo "failed" > failed.log
135152
exit 1
136153
fi
137154
}
138-
run_suite ${{ matrix.sim }} ${{ matrix.sim-limit }}
155+
run_suite "${{ matrix.sim }}" "${{ matrix.sim-limit }}"
139156
continue-on-error: true
140157
- name: Upload output log
141158
if: always()
142159
uses: actions/upload-artifact@v6
143160
with:
144-
name: hive-workspace-log-${{ matrix.sim }}
161+
name: hive-workspace-log-${{ matrix.shard }}
145162
path: hive/workspace/logs
146163
if-no-files-found: ignore
147164
- name: Test Results

.github/workflows/test-hive.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ jobs:
6060
max-allowed-failures: 0
6161
- sim: rpc
6262
sim-limit: compat
63-
max-allowed-failures: 57 # 56 missing eth_simulateV1 MaxGasUsed in Hive tests + 1 https://github.com/erigontech/erigon/issues/19758
63+
max-allowed-failures: 56 # 56 missing eth_simulateV1 MaxGasUsed in Hive tests
6464
steps:
6565
- name: Checkout Erigon go.mod
6666
uses: actions/checkout@v6

.github/workflows/test-integration-caplin.yml

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -37,16 +37,14 @@ jobs:
3737
- uses: ./.github/actions/setup-erigon
3838
id: erigon
3939

40-
- name: Download consensus spec tests
41-
id: download-spec-tests
42-
continue-on-error: true
43-
run: cd cl/spectest && make download-spec
44-
45-
- name: Run consensus spec tests
46-
if: steps.download-spec-tests.outcome == 'success'
40+
- name: test-integration-caplin
4741
run: |
48-
cd cl/spectest && make tests
49-
make mainnet 2>&1 | ../../tools/filter-test-output | tee run.log
42+
cd cl/spectest
43+
if make tests; then
44+
make mainnet 2>&1 | ../../tools/filter-test-output | tee run.log
45+
else
46+
echo "::warning::consensus spec test setup failed; skipping mainnet tests"
47+
fi
5048
5149
- uses: ./.github/actions/cleanup-erigon
5250
if: always()

agents.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,10 @@ Cherry-pick PRs: when opening a PR that cherry-picks a commit to a `release/X.Y`
5656

5757
**Important**: Always run `make lint` after making code changes and before committing. Fix any linter errors before proceeding. PRs must pass `make lint` before being opened or updated.
5858

59+
## Pull Requests & Workflows
60+
61+
When manually dispatching a workflow that is not part of the PR's automatic check list, add a comment on the PR explaining which workflow was dispatched, why it was chosen, and include a direct link to the workflow run.
62+
5963
## Pre-push
6064

6165
Before running `git push`, always run `make lint` first and fix all issues. Run lint multiple times if needed — it is non-deterministic.

0 commit comments

Comments
 (0)