perf(l1): add BOLT post-link optimization setup#5981
Conversation
6043adc to
5f86afd
Compare
Greptile OverviewGreptile SummaryThis PR adds LLVM BOLT (Binary Optimization and Layout Tool) post-link optimization infrastructure to ethrex. BOLT can improve binary performance by 2-15% through profile-guided code layout optimization without source code changes. Key Changes
Issues Found
Testing StatusPR is marked as draft with testing incomplete. The configuration needs validation on Linux x86-64 with full BOLT toolchain (LLVM 19+) and performance benchmarking. Confidence Score: 3/5
|
| Filename | Overview |
|---|---|
| .cargo/config.toml | Adds BOLT compatibility linker flags for x86_64 and ARM64, but ARM64 config contradicts known BOLT incompatibility |
| Makefile | Adds comprehensive BOLT workflow targets with instrumentation, profiling, and optimization steps |
| crates/common/crypto/kzg.rs | Refactors KZG warmup function to avoid .warm in symbol names, addresses BOLT detection issue |
Sequence Diagram
sequenceDiagram
participant Dev as Developer
participant Make as Makefile
participant Cargo as Cargo Build
participant BOLT as LLVM BOLT
participant Binary as ethrex Binary
Note over Dev,Binary: BOLT Optimization Workflow
Dev->>Make: make build-bolt
Make->>Cargo: cargo build --profile release-bolt<br/>with CXXFLAGS
Note over Cargo: Applies linker flags:<br/>--emit-relocs, -Wl,-q<br/>force-frame-pointers
Cargo->>Binary: target/release-bolt/ethrex<br/>(with relocations)
Dev->>Make: make bolt-instrument
Make->>BOLT: llvm-bolt -instrument
BOLT->>Binary: ethrex-instrumented<br/>(instrumented binary)
Dev->>Binary: Run ./ethrex-instrumented<br/>with workload
Binary->>Binary: Collect profile data
Binary-->>Make: /tmp/bolt-profiles/prof.*.fdata
Dev->>Make: make bolt-optimize
Make->>BOLT: llvm-bolt with profile data<br/>-reorder-blocks=ext-tsp<br/>-reorder-functions=cdsort<br/>-split-functions
BOLT->>Binary: ethrex-bolt-optimized<br/>(optimized binary)
Dev->>Binary: Deploy optimized binary
| [target.aarch64-unknown-linux-gnu] | ||
| rustflags = [ | ||
| # BOLT compatibility flags - preserves relocation info for post-link optimization | ||
| "-Clink-arg=-Wl,--emit-relocs", | ||
| "-Clink-arg=-Wl,-q", | ||
| "-Cforce-frame-pointers=yes", # Better profiling accuracy | ||
| ] |
There was a problem hiding this comment.
ARM64 BOLT flags conflict with documented limitations. Line 10 notes ARM64 "currently fails during BOLT emission phase" and Makefile line 24 states "ARM64: Fails with 'Undefined temporary symbol .Ltmp0'", yet these flags are still applied to aarch64-unknown-linux-gnu builds.
| [target.aarch64-unknown-linux-gnu] | |
| rustflags = [ | |
| # BOLT compatibility flags - preserves relocation info for post-link optimization | |
| "-Clink-arg=-Wl,--emit-relocs", | |
| "-Clink-arg=-Wl,-q", | |
| "-Cforce-frame-pointers=yes", # Better profiling accuracy | |
| ] | |
| # [target.aarch64-unknown-linux-gnu] | |
| # rustflags = [ | |
| # # BOLT compatibility flags - preserves relocation info for post-link optimization | |
| # # DISABLED: ARM64 currently fails during BOLT emission phase (LLVM bug with .Ltmp symbols) | |
| # "-Clink-arg=-Wl,--emit-relocs", | |
| # "-Clink-arg=-Wl,-q", | |
| # "-Cforce-frame-pointers=yes", # Better profiling accuracy | |
| # ] |
Prompt To Fix With AI
This is a comment left during a code review.
Path: .cargo/config.toml
Line: 22:28
Comment:
ARM64 BOLT flags conflict with documented limitations. Line 10 notes ARM64 "currently fails during BOLT emission phase" and Makefile line 24 states "ARM64: Fails with 'Undefined temporary symbol .Ltmp0'", yet these flags are still applied to `aarch64-unknown-linux-gnu` builds.
```suggestion
# [target.aarch64-unknown-linux-gnu]
# rustflags = [
# # BOLT compatibility flags - preserves relocation info for post-link optimization
# # DISABLED: ARM64 currently fails during BOLT emission phase (LLVM bug with .Ltmp symbols)
# "-Clink-arg=-Wl,--emit-relocs",
# "-Clink-arg=-Wl,-q",
# "-Cforce-frame-pointers=yes", # Better profiling accuracy
# ]
```
How can I resolve this? If you propose a fix, please make it concise.| .spawn(|| { | ||
| std::hint::black_box(c_kzg::ethereum_kzg_settings(KZG_PRECOMPUTE)); | ||
| }); | ||
| std::thread::spawn(do_kzg_warmup); |
There was a problem hiding this comment.
Missing error handling - spawned thread error is silently ignored. If do_kzg_warmup panics or the spawn fails, it won't be logged or reported.
Consider adding basic error handling:
std::thread::spawn(do_kzg_warmup).ok();or logging spawn failures.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/common/crypto/kzg.rs
Line: 29:29
Comment:
Missing error handling - spawned thread error is silently ignored. If `do_kzg_warmup` panics or the spawn fails, it won't be logged or reported.
Consider adding basic error handling:
```rust
std::thread::spawn(do_kzg_warmup).ok();
```
or logging spawn failures.
How can I resolve this? If you propose a fix, please make it concise.| "-Clink-arg=-Wl,--emit-relocs", | ||
| "-Clink-arg=-Wl,-q", | ||
| "-Cforce-frame-pointers=yes", # Better profiling accuracy |
There was a problem hiding this comment.
BOLT flags applied globally to all x86_64 Linux builds. This adds --emit-relocs to all builds regardless of profile, increasing binary size even for debug builds.
Consider: Are these flags needed for non-BOLT profiles? Document how to conditionally apply them only for release-bolt profile builds if needed.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: .cargo/config.toml
Line: 17:19
Comment:
BOLT flags applied globally to all x86_64 Linux builds. This adds `--emit-relocs` to all builds regardless of profile, increasing binary size even for debug builds.
Consider: Are these flags needed for non-BOLT profiles? Document how to conditionally apply them only for `release-bolt` profile builds if needed.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.Add build configuration and Makefile targets for LLVM BOLT optimization, which can improve binary performance by 2-15% through better code layout based on runtime profiling. Changes: - Add release-bolt and release-pgo-bolt profiles to Cargo.toml - Add --emit-relocs linker flags to .cargo/config.toml for BOLT compatibility - Add Makefile targets: build-bolt, bolt-instrument, bolt-optimize, bolt-clean - Add cargo-pgo workflow targets: pgo-bolt-build, pgo-bolt-optimize, pgo-full-* Usage requires LLVM 16+ with BOLT tools on Linux x86-64.
…nges
Move BOLT linker flags (--emit-relocs, -Wl,-q, frame pointers) from the
global .cargo/config.toml into a dedicated .cargo/bolt.toml loaded only
by `make build-bolt` via --config. This prevents BOLT flags from affecting
normal dev/release/test builds (larger binaries, extra register used).
Also:
- Remove [target.aarch64-unknown-linux-gnu] section (BOLT unsupported on ARM64)
- Revert warm_block -> preheat_block rename (BOLT's .warm detection matches
LLVM-generated suffixes, not Rust function names in mangled symbols)
- Restore kzg.rs thread name ("kzg-warmup") lost by the original refactor
- Add bolt-perf2bolt and bolt-verify to .PHONY
- Fix docs to use actual ethrex CLI flags and reference bolt.toml
- Trim Makefile comment block (details now in docs/developers/bolt-optimization.md)
9e26076 to
de34876
Compare
BOLT 19 fails with "parent function not found" when analyzing the drop_in_place specialization for the closure inside spawn_unchecked. Using a function pointer instead of an inline closure produces a simpler symbol that BOLT can analyze without hitting this error. The thread name "kzg-warmup" is preserved via Builder::new().name().
Lines of code reportTotal lines added: Detailed view |
BOLT 19 fails with "parent function not found" when it encounters .cold fragments of drop_in_place specializations for complex Rust closure types (e.g., rayon iterators). Disable the hot-cold split pass via -Cllvm-args=-hot-cold-split=false in the BOLT config. BOLT itself will handle hot/cold splitting during its optimization phase using the runtime profile data, which is more effective since it's based on actual execution patterns rather than static heuristics.
Thin LTO creates cross-module function fragments that BOLT 19 can't match to their parent functions. Complex monomorphized symbols (e.g., rayon parallel iterators) end up as disjoint address ranges that BOLT interprets as orphaned split fragments. Fat LTO produces a single compilation unit with contiguous function bodies, avoiding this issue entirely. The additional compile time is acceptable since BOLT builds are infrequent (only for production optimization workflows).
BOLT 19's split-function detection regex matches '.warm' anywhere in mangled symbol names. Rust's legacy mangling turns '::warm_block' into '..warm_block', which contains '.warm' as a substring. This causes BOLT to misidentify drop_in_place specializations for rayon closures inside warm_block as orphaned split-function fragments, producing a hard error. Confirmed by testing on ethrex-office-3: the original name causes "BOLT-ERROR: parent function not found" during both instrumentation and perf2bolt conversion.
Benchmark Block Execution Results Comparison Against Main
|
- Fix libbolt_rt_instr.a symlink path (/usr/local/lib/, not /usr/lib/) - Add symlink instructions for llvm-bolt/perf2bolt/merge-fdata binaries - Add ICF and use-gnu-stack flags to bolt-optimize Makefile target - Document SIGINT requirement for instrumented binary profile flushing - Add detailed troubleshooting for Rust symbol mangling + BOLT conflicts - Add profile quality warning (snap-sync profile = 0% gain on block exec) - Add measured benchmark results (1.4% on 1,110-block ERC20 import) - Document hot-cold-split and fat LTO in implementation notes - Add Rust source constraints section (function naming, closure patterns)
…ands The Quick Start now walks through the full BOLT workflow using ethrex's actual fixture files (l2-1k-erc20.rlp + perf-ci.json), including exact profiling, optimization, and benchmarking commands. Also adds a table of available benchmark fixtures and replaces the generic hyperfine snippet with the real benchmark loop used during testing.
An agent (or developer) can now run the entire BOLT workflow with: make bolt-full # build → instrument → profile → optimize → verify make bolt-bench # 3-run benchmark comparison New targets: - bolt-check: validates prerequisites (x86_64, Linux, llvm-bolt, LFS fixtures) - bolt-profile: runs instrumented binary with l2-1k-erc20.rlp benchmark - bolt-full: chains instrument → profile → optimize → verify - bolt-bench: 3 runs each of baseline vs BOLT-optimized Also adds BOLT_GENESIS and BOLT_BLOCKS variables so custom fixtures can be used via: make bolt-full BOLT_BLOCKS=path/to/blocks.rlp
Each step now shows what success looks like (actual output from testing), what warnings are safe to ignore, and what errors mean with how to fix them. An agent or developer can now compare their terminal output against the examples to know if the workflow is working correctly.
🤖 Kimi Code ReviewSecurity ReviewNo security vulnerabilities found in this PR. The changes are limited to build tooling, optimization profiles, and documentation. Bug ReviewNo logic bugs found in the changes. The code changes are minimal and focused on BOLT compatibility:
Performance ReviewNo significant performance issues found. The changes are actually performance-oriented:
Minor Observations
Automated review by Kimi (Moonshot AI) |
🤖 Codex Code ReviewI found no security vulnerabilities, bugs, or significant performance issues in the PR diff. Automated review by OpenAI Codex |
🤖 Claude Code ReviewBased on my review of this PR, here is my security, bug, and performance analysis: PR #5981 Review: BOLT Post-Link Optimization SetupSummaryThis PR adds LLVM BOLT optimization infrastructure for Linux x86-64 builds. Overall, the implementation is well-designed and low-risk. The changes are primarily build configuration, documentation, and minor refactoring for BOLT compatibility. Issues Found🟡 Medium: Unhandled Thread Spawn Error in KZG WarmupFile: let _ = std::thread::Builder::new()
.name("kzg-warmup".into())
.spawn(do_kzg_warmup);Issue: The thread spawn result is explicitly ignored with Recommendation: if let Err(e) = std::thread::Builder::new()
.name("kzg-warmup".into())
.spawn(do_kzg_warmup)
{
// Log the error but don't panic - KZG will still work, just with first-call overhead
eprintln!("Warning: Failed to spawn KZG warmup thread: {}", e);
}Or at minimum: let _ = std::thread::Builder::new()
.name("kzg-warmup".into())
.spawn(do_kzg_warmup)
.inspect_err(|e| eprintln!("Warning: KZG warmup thread spawn failed: {}", e));🟢 Low: Temporary Directory Cleanup Race ConditionFile: Issue: Multiple targets use Recommendation: Use unique temp directories with 🟢 Low: Shell Wildcard Expansion Without QuotesFile: merge-fdata $(BOLT_PROFILE_DIR)/prof.* > $(BOLT_PROFILE_DIR)/merged.fdataIssue: If Recommendation: The existing Non-Issues (False Alarms)✅ BOLT Profile Directory in /tmpUsing ✅ rm -rf UsageThe ✅ Debug Symbols in Release BuildThe ✅ Function Rename (warm_block → preheat_block)This is purely a workaround for BOLT's regex limitations and has no functional impact. The rename is correctly applied in both definition and call site. Security AssessmentNo Critical or High Severity Issues Found
Performance NotesExpected Impact
No Performance Regressions ExpectedThe BOLT flags are isolated to ConclusionApproval Status: ✅ Approve with minor recommendations This PR is well-implemented and low-risk. The only actionable issue is the unhandled thread spawn error in the KZG warmup code, which should be logged rather than silently ignored. The temp directory collision risk is minor but could be addressed for robustness. The code is well-documented, the BOLT workflow is properly isolated from normal builds, and the testing appears thorough. Automated review by Claude (Anthropic) |
Greptile OverviewGreptile SummaryAdded comprehensive BOLT (Binary Optimization and Layout Tool) post-link optimization setup for Linux x86_64 production builds. BOLT rearranges binary code layout based on runtime profiling to reduce instruction cache misses and improve branch prediction. Key changes:
Benchmark results: ~1.4% improvement on block import (1,110 blocks, ~1.5M transactions) with significant branch prediction improvements (84% reduction in taken forward branches). BOLT workflow has been end-to-end tested on Debian Trixie x86_64 with BOLT 19. Critical issue found: Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| .cargo/bolt.toml | New file with BOLT-specific rustflags for x86_64 Linux, properly isolated from regular builds |
| Cargo.toml | Added release-bolt profile with fat LTO and release-pgo-bolt profile, but pgo-bolt profile missing lto configuration |
| Makefile | Comprehensive BOLT workflow targets with prerequisite validation, automated profiling, and benchmarking |
| crates/vm/backends/levm/mod.rs | Renamed warm_block to preheat_block to avoid BOLT false-positive split-function detection |
| docs/developers/bolt-optimization.md | Comprehensive documentation with step-by-step examples, troubleshooting, and performance validation guidance |
Sequence Diagram
sequenceDiagram
participant Dev as Developer
participant Make as Makefile
participant Cargo as Cargo Build
participant BOLT as llvm-bolt
participant Binary as ethrex Binary
Dev->>Make: make bolt-full
Make->>Make: bolt-check (validate prereqs)
rect rgb(200, 220, 255)
Note over Make,Cargo: Step 1: Build
Make->>Cargo: build --profile release-bolt --config .cargo/bolt.toml
Cargo->>Cargo: Apply fat LTO + BOLT rustflags
Cargo->>Binary: target/release-bolt/ethrex (with relocations)
end
rect rgb(255, 220, 200)
Note over Make,BOLT: Step 2: Instrument
Make->>BOLT: llvm-bolt --instrument
BOLT->>Binary: ethrex-instrumented (with counters)
end
rect rgb(220, 255, 200)
Note over Make,Binary: Step 3: Profile
Make->>Binary: Run ethrex-instrumented with fixture
Binary->>Binary: Execute workload & collect profiles
Binary->>Make: /tmp/bolt-profiles/prof.*.fdata
end
rect rgb(255, 255, 200)
Note over Make,BOLT: Step 4: Optimize
Make->>BOLT: llvm-bolt with profile data
BOLT->>BOLT: Reorder blocks & functions
BOLT->>Binary: ethrex-bolt-optimized (optimized layout)
end
rect rgb(220, 220, 255)
Note over Make,Binary: Step 5: Verify & Bench
Make->>Binary: readelf -S (check .note.bolt_info)
Make->>Binary: Run baseline vs optimized benchmarks
Binary->>Dev: Performance comparison results
end
| [profile.release-pgo-bolt] | ||
| inherits = "release" | ||
| debug = 1 |
There was a problem hiding this comment.
release-pgo-bolt profile missing lto setting. According to line 428 in docs/developers/bolt-optimization.md, fat LTO is required for BOLT compatibility because "thin LTO creates .lto_priv fragments incompatible with BOLT". The release-bolt profile (line 61) correctly sets lto = "fat", but release-pgo-bolt doesn't specify this.
| [profile.release-pgo-bolt] | |
| inherits = "release" | |
| debug = 1 | |
| [profile.release-pgo-bolt] | |
| inherits = "release" | |
| debug = 1 | |
| lto = "fat" |
Prompt To Fix With AI
This is a comment left during a code review.
Path: Cargo.toml
Line: 64:66
Comment:
`release-pgo-bolt` profile missing `lto` setting. According to line 428 in `docs/developers/bolt-optimization.md`, fat LTO is required for BOLT compatibility because "thin LTO creates `.lto_priv` fragments incompatible with BOLT". The `release-bolt` profile (line 61) correctly sets `lto = "fat"`, but `release-pgo-bolt` doesn't specify this.
```suggestion
[profile.release-pgo-bolt]
inherits = "release"
debug = 1
lto = "fat"
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Pull request overview
Adds an automated LLVM BOLT post-link optimization workflow for Linux x86_64 production builds, including build profiles/config, Makefile automation, and documentation, plus a couple of Rust symbol-compatibility adjustments needed for BOLT analysis.
Changes:
- Add BOLT-oriented Cargo profiles and an isolated
.cargo/bolt.tomlconfig for BOLT-only rustflags. - Add Makefile targets to run the full BOLT workflow (build → instrument → profile → optimize → verify) and benchmarking/cleanup helpers.
- Add BOLT documentation and make small Rust source changes to avoid BOLT split-function/symbol-analysis pitfalls (
warm_blockrename, KZG warmup thread closure → function pointer).
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
docs/developers/bolt-optimization.md |
New end-to-end guide for running BOLT (and PGO+BOLT) with troubleshooting and example outputs. |
crates/vm/backends/levm/mod.rs |
Rename warm_block → preheat_block and document why (BOLT symbol regex interaction). |
crates/blockchain/blockchain.rs |
Update callsite to renamed LEVM::preheat_block. |
crates/common/crypto/kzg.rs |
Replace thread::spawn closure with function pointer to simplify symbols for BOLT. |
Makefile |
Add bolt-* and pgo-* targets implementing the BOLT workflow, plus prerequisite checks. |
Cargo.toml |
Add release-bolt and release-pgo-bolt build profiles for BOLT usage. |
.cargo/config.toml |
No functional change (trailing newline). |
.cargo/bolt.toml |
New isolated config containing BOLT-specific rustflags and linker args. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ## Makefile Targets Reference | ||
|
|
||
| | Target | Description | | ||
| |--------|-------------| | ||
| | `make bolt-full` | Full automated workflow: build → instrument → profile → optimize → verify | | ||
| | `make bolt-bench` | Benchmark baseline vs BOLT-optimized (3 runs each) | | ||
| | `make build-bolt` | Build BOLT-compatible binary with fat LTO and relocations | | ||
| | `make bolt-instrument` | Create instrumented binary for profiling | | ||
| | `make bolt-profile` | Run instrumented binary with benchmark fixture to collect profiles | | ||
| | `make bolt-optimize` | Apply BOLT optimization using collected profiles | | ||
| | `make bolt-verify` | Check that optimized binary has BOLT markers | | ||
| | `make bolt-perf2bolt` | Convert `perf.data` to BOLT format (alternative to instrumentation) | | ||
| | `make bolt-clean` | Remove all BOLT artifacts and profiles | | ||
|
|
There was a problem hiding this comment.
This Makefile targets table uses || at the start of rows, which renders as an extra empty first column in GitHub Markdown. Switch to standard | Target | Description | formatting.
| | Binary | Avg time (5 runs) | | ||
| |--------|-------------------| | ||
| | Baseline (release-bolt) | 17,706 ms | | ||
| | BOLT-optimized | 17,465 ms | | ||
| | **Improvement** | **~1.4%** | | ||
|
|
There was a problem hiding this comment.
The measured-results table also uses a leading || in rows, which adds an unintended empty column in GitHub Markdown. Use standard | Binary | Avg time (5 runs) | table formatting.
| @if [ ! -s $(BOLT_BLOCKS) ]; then \ | ||
| echo "ERROR: $(BOLT_BLOCKS) missing or empty. Run 'git lfs pull' to fetch fixture files."; \ | ||
| exit 1; \ | ||
| fi |
There was a problem hiding this comment.
bolt-check verifies the blocks fixture exists, but the workflow also depends on $(BOLT_GENESIS) (default fixtures/genesis/perf-ci.json). If that file is missing, failures happen later during profiling/bench. Consider adding a similar -s $(BOLT_GENESIS) check here for earlier, clearer errors.
| fi | |
| fi | |
| @if [ ! -s $(BOLT_GENESIS) ]; then \ | |
| echo "ERROR: $(BOLT_GENESIS) missing or empty. Run 'git lfs pull' to fetch fixture files."; \ | |
| exit 1; \ | |
| fi |
| bolt-profile: ## 📊 Run instrumented binary with benchmark blocks to collect profile data | ||
| @test -f ethrex-instrumented || { echo "ERROR: Run 'make bolt-instrument' first."; exit 1; } | ||
| @rm -rf /tmp/bolt-data $(BOLT_PROFILE_DIR)/prof.* | ||
| @echo "Profiling with $(BOLT_BLOCKS) (this may take a few minutes)..." | ||
| ./ethrex-instrumented \ | ||
| --network $(BOLT_GENESIS) \ | ||
| --datadir /tmp/bolt-data \ | ||
| import $(BOLT_BLOCKS) | ||
| @rm -rf /tmp/bolt-data |
There was a problem hiding this comment.
In bolt-profile, /tmp/bolt-data is cleaned up only after the instrumented import completes successfully. If the command fails or is interrupted, the directory may be left behind and grow large over repeated runs. Consider using a shell trap (or running the recipe in a single set -e; ...; trap ... block) to ensure cleanup happens on error/interrupt as well.
| # For PGO + BOLT combined (maximum optimization) | ||
| [profile.release-pgo-bolt] | ||
| inherits = "release" | ||
| debug = 1 |
There was a problem hiding this comment.
release-pgo-bolt inherits release, which sets lto = "thin", but the PR rationale/docs state fat LTO is required for BOLT compatibility. As-is, release-pgo-bolt will still use thin LTO and may reintroduce .lto_priv.* fragments / BOLT analysis failures. Consider explicitly setting lto = "fat" (and any other BOLT-required settings) on release-pgo-bolt as well.
| debug = 1 | |
| debug = 1 | |
| lto = "fat" |
| bolt-check: | ||
| @uname -m | grep -q x86_64 || { echo "ERROR: BOLT requires x86_64 (current: $$(uname -m))"; exit 1; } | ||
| @uname -s | grep -q Linux || { echo "ERROR: BOLT requires Linux (current: $$(uname -s))"; exit 1; } | ||
| @command -v llvm-bolt >/dev/null 2>&1 || { echo "ERROR: llvm-bolt not found. See docs/developers/bolt-optimization.md for install instructions."; exit 1; } | ||
| @if [ ! -s $(BOLT_BLOCKS) ]; then \ | ||
| echo "ERROR: $(BOLT_BLOCKS) missing or empty. Run 'git lfs pull' to fetch fixture files."; \ | ||
| exit 1; \ | ||
| fi |
There was a problem hiding this comment.
bolt-check validates llvm-bolt but later recipes also require merge-fdata (instrumentation workflow) and perf2bolt (perf workflow). Without checking these here, make bolt-* can fail later with less actionable tool-not-found errors. Consider extending bolt-check (or adding per-target checks) to validate merge-fdata and perf2bolt when needed.
| bolt-perf2bolt: ## 📊 Convert perf.data to BOLT profile format | ||
| @mkdir -p $(BOLT_PROFILE_DIR) |
There was a problem hiding this comment.
bolt-perf2bolt assumes $(BOLT_BINARY) exists, but it doesn't depend on build-bolt/bolt-check and doesn't validate the binary path. This can lead to confusing failures if the user runs make bolt-perf2bolt first. Consider adding a dependency on build-bolt (or at least bolt-check) and/or an explicit test -f $(BOLT_BINARY) guard with a clear error message.
| bolt-perf2bolt: ## 📊 Convert perf.data to BOLT profile format | |
| @mkdir -p $(BOLT_PROFILE_DIR) | |
| bolt-perf2bolt: build-bolt ## 📊 Convert perf.data to BOLT profile format | |
| @mkdir -p $(BOLT_PROFILE_DIR) | |
| @if [ ! -f "$(BOLT_BINARY)" ]; then \ | |
| echo "ERROR: BOLT binary '$(BOLT_BINARY)' not found. Run 'make build-bolt' first."; \ | |
| exit 1; \ | |
| fi |
| **Available benchmark fixtures:** | ||
|
|
||
| | Fixture | Genesis | Blocks | Transactions | Best for | | ||
| |---------|---------|--------|-------------|----------| | ||
| | `l2-1k-erc20.rlp` | `perf-ci.json` | 1,110 | ~1.5M ERC20 transfers | EVM execution | | ||
| | `2000-blocks.rlp` | `perf-ci.json` | 2,004 | ~0 per block | Storage/merkle | | ||
|
|
There was a problem hiding this comment.
Several tables use a leading double pipe (||) in the header and separator rows (e.g., the fixtures and Makefile target tables). In GitHub Markdown this renders as an extra empty first column and looks broken. Use standard table syntax with single leading/trailing | per row.
| bolt-optimize: ## ⚡ Apply BOLT optimization using collected profiles | ||
| @if [ -f $(BOLT_PROFILE_DIR)/perf.fdata ]; then \ | ||
| llvm-bolt $(BOLT_BINARY) -o ethrex-bolt-optimized \ | ||
| -data=$(BOLT_PROFILE_DIR)/perf.fdata \ | ||
| -reorder-blocks=ext-tsp \ | ||
| -reorder-functions=cdsort \ |
There was a problem hiding this comment.
bolt-optimize invokes llvm-bolt $(BOLT_BINARY) but doesn't depend on build-bolt/bolt-check and doesn't validate that $(BOLT_BINARY) exists before running. If the user runs make bolt-optimize directly, the error will come from llvm-bolt and may be unclear. Consider adding a dependency on build-bolt (or at least a test -f $(BOLT_BINARY) check with a clear message).
| #### Option 2: Latest from apt.llvm.org (BOLT 22+) | ||
| ```bash | ||
| wget -qO- https://apt.llvm.org/llvm-snapshot.gpg.key | sudo tee /etc/apt/trusted.gpg.d/apt.llvm.org.asc | ||
| echo "deb http://apt.llvm.org/unstable/ llvm-toolchain main" | sudo tee /etc/apt/sources.list.d/llvm.list |
There was a problem hiding this comment.
The apt repository in the apt.llvm.org example uses http://apt.llvm.org/.... Using HTTPS avoids MITM risk during package index download (even though packages are signed). Consider switching that URL to https://apt.llvm.org/... in the instructions.
| echo "deb http://apt.llvm.org/unstable/ llvm-toolchain main" | sudo tee /etc/apt/sources.list.d/llvm.list | |
| echo "deb https://apt.llvm.org/unstable/ llvm-toolchain main" | sudo tee /etc/apt/sources.list.d/llvm.list |
- Add lto = "fat" to release-pgo-bolt profile (Copilot, Greptile) - Add bolt-check and bolt-bench to .PHONY (Copilot) - Validate BOLT_GENESIS in bolt-check (Copilot) - Add binary existence guard to bolt-optimize and bolt-perf2bolt (Copilot) - Use HTTPS for apt.llvm.org in docs (Copilot)
| [target.x86_64-unknown-linux-gnu] | ||
| rustflags = [ | ||
| "-Ctarget-cpu=x86-64-v3", | ||
| "-Ctarget-feature=+avx2,+sse2,+ssse3,+sse4.1,+sse4.2,+bmi1,+lzcnt,+pclmulqdq", |
There was a problem hiding this comment.
Lines 9-10 duplicate the target CPU flags from .cargo/config.toml (lines 3-4). If someone adds a new target feature to config.toml (e.g., +fma), they'll need to remember to update bolt.toml too — and there's no CI check to catch drift.
Consider either:
- Removing these two lines from
bolt.tomlso the baseconfig.tomlflags apply andbolt.tomlonly adds the BOLT-specific flags (--emit-relocs, frame pointers, hot-cold-split), or - Adding a comment like
# Keep in sync with .cargo/config.tomlto make the dependency explicit.
| @echo "BOLT optimization complete. Optimized binary: ethrex-bolt-optimized" | ||
| @echo "Benchmark with: make bolt-bench" | ||
|
|
||
| bolt-bench: ## 📈 Benchmark baseline vs BOLT-optimized binary |
There was a problem hiding this comment.
nit: The benchmark loop discards all output except lines matching "Import completed" via grep. If the import fails (e.g., missing fixture, wrong genesis), all error output is silently eaten and the loop reports nothing for that run. Consider | grep "Import completed" || echo "ERROR: import failed (run $i)" so failures are visible.
|
|
||
| # cargo-pgo workflow (requires: cargo install cargo-pgo) | ||
| # NOTE: cargo-pgo doesn't pass CXXFLAGS, so use the manual Makefile targets instead | ||
| pgo-bolt-build: ## 🔨 Build with cargo-pgo for BOLT instrumentation (use build-bolt instead) |
There was a problem hiding this comment.
nit: This target's ## help text says (use build-bolt instead) and the first line of the recipe is an echo saying the same. If it's not recommended, consider removing the ## help annotation so it doesn't show up in make help (keep the echo for anyone who calls it directly).
| /// be used by the sequential execution phase. | ||
| pub fn warm_block( | ||
| /// | ||
| /// Named `preheat_block` instead of `warm_block` because BOLT's split-function |
There was a problem hiding this comment.
The rename is well-motivated, but this doc comment ties a public API name to an external tool's implementation detail. If BOLT fixes its regex matching in a future version, the name preheat_block and this comment become confusing relics.
Consider: keep the rename (it's a fine name regardless), but move the BOLT rationale to the docs page (bolt-optimization.md, which already has a "Rust source constraints" section) rather than embedding it in the function's doc comment. A one-line // See docs/developers/bolt-optimization.md would suffice here if you want a breadcrumb.
Motivation
BOLT (Binary Optimization and Layout Tool) is an LLVM post-link optimizer that rearranges binary code layout based on runtime profiling data. It improves performance by placing hot code paths contiguously in memory, reducing instruction cache (I-cache) misses and branch mispredictions. This is particularly beneficial for Ethereum execution clients where hot paths include:
Description
This PR adds the build configuration, Makefile targets, and documentation needed to use BOLT optimization on Linux x86-64 production builds. The entire workflow is automated via
make bolt-full.Changes
Build configuration:
Cargo.toml: Addedrelease-boltprofile (inheritsrelease, addsdebug = 1for symbols, useslto = "fat"for BOLT compatibility) andrelease-pgo-boltprofile.cargo/bolt.toml(new): Isolated BOLT-specific rustflags (--emit-relocs,-Wl,-q,-Cforce-frame-pointers=yes,-Cllvm-args=-hot-cold-split=false) loaded only viacargo build --config .cargo/bolt.toml— this ensures BOLT flags never affect regular buildsMakefile: Added targets for the full BOLT workflow (see table below)Makefile targets:
make bolt-fullmake bolt-benchmake build-boltmake bolt-instrumentmake bolt-profilemake bolt-optimizemake bolt-verifymake bolt-perf2boltperf.datato BOLT format (alternative to instrumentation)make bolt-cleanPrerequisite validation (
bolt-check) runs automatically beforebuild-bolt, verifying x86_64 Linux,llvm-boltinstalled, and Git LFS fixture files present.BOLT compatibility fixes:
crates/vm/backends/levm/mod.rs: Renamedwarm_block→preheat_block— BOLT's split-function regex matches.warminside Rust's legacy-mangled::warm_block(which becomes..warm_block..), causing "parent function not found" errors during BOLT analysiscrates/blockchain/blockchain.rs: Updated call site to usepreheat_blockcrates/common/crypto/kzg.rs: Extracted KZG warmup closure into a standalone function pointer (do_kzg_warmup) — closures instd::thread::spawnproduce complexdrop_in_placespecialization symbols that BOLT can't analyze. UsingBuilder::new().name("kzg-warmup").spawn(do_kzg_warmup)preserves the thread name while generating simpler symbolsDocumentation:
docs/developers/bolt-optimization.md(new): Comprehensive guide with example output for every step, troubleshooting, and the Rust symbol naming pitfallsWhy fat LTO?
Thin LTO can create cross-module function fragments (
.lto_priv.*suffixes) that BOLT 19 can't match to parent functions. Fat LTO produces a single compilation unit with contiguous function bodies, eliminating this issue. Build time increases slightly (~50s) but BOLT compatibility is guaranteed.Why disable hot-cold splitting?
LLVM's HotColdSplitting pass creates
.coldfunction fragments for rarely-executed paths (e.g.,drop_in_placespecializations). BOLT 19's relocation mode can't match these fragments to their parent functions. The-Cllvm-args=-hot-cold-split=falseflag in.cargo/bolt.tomlprevents this.Why rename
warm_blocktopreheat_block?BOLT's split-function detection uses a regex that matches
.warmand.coldanywhere in ELF symbol names. Rust's legacy mangling scheme converts::to.., so::warm_blockbecomes..warm_block..in the symbol table — which contains.warmas a substring. This triggered BOLT's split-function regex for all 7 rayon-generated symbols involvingwarm_block. Renaming topreheat_blockeliminated all 7 false matches (reducing split-symbol warnings from 21 to 14, with the remaining 14 being jemalloc.coldsymbols that BOLT handles gracefully).Testing
Tested end-to-end on ethrex-office-3 (Debian Trixie, x86_64, 128-core AMD EPYC):
Environment
BOLT Workflow Verification
make build-bolt— compiled successfully with fat LTO and BOLT flagsmake bolt-instrument— processed 25,271 functions, 403,378 branch counters, 790,788 total countersmake bolt-profile— imported 1,110 ERC20 blocks (~1.5M transactions), collected 15MB of profile datamake bolt-optimize— 4,504 functions (17.6%) had non-empty profiles, 63.3% of profiled functions had layout modifiedmake bolt-verify— confirmed BOLT markers presentmake bolt-bench— results belowBOLT Optimization Statistics (dynostats)
Benchmark Results
Workload: Import 1,110 blocks containing ~1.5M ERC20 transfer transactions (2.4-2.9 Ggas/s throughput), using
perf-ci.jsongenesis.Improvement: ~1.4% (241 ms faster on average, excluding warmup run 1).
The improvement is modest because:
Key Findings During Testing
Rust symbol mangling + BOLT is fragile: BOLT's split-function regex matches
.warmand.coldsubstrings anywhere in mangled symbol names. This means Rust function names containingwarmorcold(e.g.,warm_block,cold_start) will trigger false positives. The fix is to rename such functions.Closures in
thread::spawncause BOLT errors: Complex closures produce deeply nesteddrop_in_placespecialization symbols that BOLT can't analyze in relocation mode. Using function pointers instead produces simpler symbols.Fat LTO is required for BOLT compatibility: Thin LTO creates
.lto_privfragments that BOLT 19 can't resolve. Fat LTO increases build time but produces a single compilation unit.Profile quality matters: BOLT with a snap-sync profile showed 0% improvement on block import. BOLT with a block-import profile showed 1.4% improvement. For production use, profile with the actual production workload.
libbolt_rt_instr.alocation: On Debian Trixie with bolt-19, the library is at/usr/lib/llvm-19/lib/libbolt_rt_instr.abut BOLT looks for it at/usr/local/lib/libbolt_rt_instr.a. A symlink is needed:sudo ln -sf /usr/lib/llvm-19/lib/libbolt_rt_instr.a /usr/local/lib/libbolt_rt_instr.aHow to Test
See
docs/developers/bolt-optimization.mdfor detailed step-by-step instructionswith example output for every step.
Checklist