perf: full PGO pipeline - SPGO, CallFrequency layout, hot-cold splitting, cross-module inlining by benaadams · Pull Request #10877 · NethermindEth/nethermind

benaadams · 2026-03-19T17:19:36Z

Summary

Full Profile-Guided Optimization (PGO) pipeline for Nethermind, collecting runtime profiling data and using it to optimize both R2R (ReadyToRun) ahead-of-time compilation and runtime Tier-1 JIT recompilation.

R2R Compile-Time Optimizations

Cross-module inlining (--opt-cross-module:*): Passed to crossgen2 so framework methods (Dictionary.TryGetValue, Span<T>.Slice, Memory<T>.Span, etc.) can be inlined into Nethermind R2R code at build time. Without this, those call sites stay as regular method calls until Tier-1 recompiles at runtime. Safe for Docker images where the framework version is pinned by the base image hash.
CallFrequency method layout (--method-layout:callfrequency): Uses directed caller-callee edge weights from CPU sampling (917K resolved edges, 2,480 callers) to place callees after their callers in the R2R image. This preserves call direction for better instruction prefetch, unlike Pettis-Hansen which uses an undirected graph. Falls back to Pettis-Hansen when callchain data is unavailable.
Hot-cold splitting (--hot-cold-splitting): Uses SPGO block counts from the .mibc (1,361 methods with per-block CPU sample attribution) to split R2R method bodies into hot and cold sections. Cold basic blocks (error paths, exception handlers, rare branches) are moved to a .text.cold section, keeping the hot code working set smaller and improving I-cache density. The .NET equivalent of BOLT's basic block reordering.
Profile-driven inlining (DOTNET_JitInlinePolicyProfile=1): The Tier-1 JIT inlines more aggressively at hot call sites and less at cold ones, based on the seeded PGO frequency data.

EVM Opcode Warmup (`VirtualMachine.Warmup.cs`)

Representative values: Replaced PushOne (value=1) with multi-word UInt256 values that exercise common arithmetic paths. Value 1 caused Tier-0 PGO to profile degenerate branches - DIV/MOD by 1 takes the trivial fast-path, EXP with base 1 is identity, SHL/SHR by 1 is minimal shift. The seeded edge counts now reflect the branches that mainnet contracts actually take (multi-word division, full remainder, etc.).
Skip state-touching opcodes: SLOAD, SSTORE, CALL, STATICCALL, DELEGATECALL, CREATE, LOG, BALANCE, etc. are skipped during warmup. These opcodes dispatch through IWorldState - but warmup uses a different implementation than real block processing. The JIT's Tier-0 GDV profiling records the warmup type, creating bimodal type histograms that prevent devirtualization. By skipping these opcodes, GDV profiles only capture the production IWorldState type from real execution, enabling direct devirtualization instead of slower type-check guards.

PGO Data Collection (`collect-pgo-profile.yml`)

EventPipe trace (main PGO container): Collects method load/JIT events, edge/block counts, and GDV type histograms via DOTNET_EnableEventPipe over 10,000 mainnet blocks
Edge/block profiling (.jit): Runtime PGO data from DOTNET_WritePGOData - edge counts and guarded devirtualization (GDV) type histograms that drive branch prediction and virtual call elimination at Tier-1. Compressed by PgoTrim.
CPU sampling (sampling container): perfcollect (perf + LTTng) captures ~9.3M kernel CPU samples over ~10 minutes alongside CLR events for SPGO block-level attribution and call graph extraction. Custom libcoreclr.so built with LTTng tracepoint support (Microsoft SDK ships dummy provider since dotnet/runtime#113876). TC_CallCountingDelayMs=900000 prevents Tier-1 recompilation during sampling so the perf map stays valid (without this, 97% of samples fall outside managed code).

SPGO and Call Graph Extraction

PgoTrim convert-trace: Injects missing CTF mappings (MethodDetails, MethodILToNativeMap_V1) and converts .trace.zip to .etlx with KeepAllEvents=true
PgoTrim extract-spgo: Extracts ~9.3M perf CPU sample leaf IPs to .spgo file and ~9.2M caller-callee IP pairs to .callgraph file from perfcollect's perf.data.txt callstacks
PgoTrim generate-callchain: Resolves .callgraph IPs to method names using the .etlx MethodMemoryMap, outputs CallChainProfile JSON (917K directed edges, 2,480 callers) for crossgen2's --callchain-profile / --method-layout:callfrequency
NethermindPgoPatches.cs compiled into dotnet-pgo at build time:
- LoadSpgoSamples: reads .spgo for SPGO basic block attribution (~969K samples attributed, ~10% rate)
- LoadCallGraph: reads .callgraph, resolves IP pairs via MethodMemoryMap, populates call graph and exclusive sample counts for .mibc CallWeights
- SafeSmoothAllProfiles: per-method try-catch for FlowSmoothing crash on disconnected flow graphs

Profile Data - `.mibc` (R2R compile-time)

Used by crossgen2 for ahead-of-time R2R compilation decisions:

Data Type	Coverage	Detail
Edge counts	6,615 methods	38,308 entries, 336M total executions - branch prediction hints
SPGO block counts	1,361 methods	~8K block entries, ~969K attributed CPU samples - hot/cold splitting
GDV type histograms	3,098 methods	8,616 call sites: 5,063 devirtualizable (4,558 monomorphic, 489 polymorphic)
Call graph	2,791 methods	12,365 caller-callee edges, 4,415 methods with ExclusiveWeight
Method histograms	~366 methods	~479 delegate/interface dispatch entries
Total profiled methods	7,867 with instrumentation	32,478 in hot list

Profile Data - `.callchain.json.gz` (R2R method layout)

Stored compressed in repo (222KB). Decompressed at build time by MSBuild target. Contains directed caller-callee edge weights for crossgen2's CallFrequency method layout:

Metric	Value
Resolved edges	917,618
Unique callers	2,480
Methods with samples	4,146
Top caller	KeccakHash.ComputeHash (275K edges, 3 callees)
EVM dispatch	RunByteCode (45.5K edges, 239 callees)

Profile Data - `.jit.gz` (Runtime Tier-1 JIT)

Stored compressed in repo and Docker image. Decompressed to nethermind.jit at image build time. The runtime reads it via DOTNET_ReadPGOData to seed the JIT's PGO data store, giving Tier-1 recompilation edge counts and GDV data from the first recompile without needing a warm-up period:

Data Type	Coverage	Detail
Edge counts	6,615 methods	38,308 entries, 336M executions - branch prediction from first Tier-1 recompile
GDV type histograms	3,284 methods	9,092 sites: 592 monomorphic (direct devirt), 4,560 polymorphic (guarded devirt), 5,152 devirtualizable
Method histograms	365 methods	476 entries - delegate/interface dispatch optimization
Total methods	7,239

Upstream Issues Found & PRs

microsoft/perfview#2392 - Missing MethodDetails + MethodILToNativeMap_V1 CTF event mappings in TraceEvent
microsoft/perfview#2393 - CreateFromLinuxEventSources should produce SampledProfileTraceData from perf.data
dotnet/runtime#125883 - SPGO fails with perfcollect traces (4 issues documented)
dotnet/runtime#125896 - FlowSmoothing crash in dotnet-pgo SPGO
dotnet/runtime#125932 - dotnet-pgo: support supplementary perf sample files for SPGO and call graph
dotnet/runtime#125935 - dotnet-pgo dump: JSON crash when CallWeights present
dotnet/runtime#125936 - Fix dotnet-pgo dump CallWeights JSON serialization

Other

EXPB: Added security_opt support + 120s stop timeout (NethermindEth/execution-payloads-benchmarks#9)

Type of change

Performance improvement
New feature (PGO collection pipeline)

Test plan

Copilot

Pull request overview

This PR adjusts the PGO collection workflow to retain low-count methods in the runtime .jit profile data so guarded devirtualization (GDV) type histograms are preserved for more methods, improving JIT inlining opportunities during runtime PGO.

Changes:

Removes the effective trimming thresholds for .jit edge/block profile data by setting --min-block/--min-edge to 0.
Updates workflow messaging/comments to reflect that the .jit data is being compressed (and retained) rather than aggressively trimmed.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

.github/workflows/collect-pgo-profile.yml

benaadams · 2026-03-19T17:50:51Z

@claude review this

RocksDB disposal hangs indefinitely on overlay filesystems (used by EXPB for PGO collection), preventing WritePGOData from flushing the .jit file. The process gets SIGKILL before reaching the PGO write. Add 15s timeout on lifetimeScope.DisposeAsync() so shutdown proceeds even if DB close hangs. This allows the runtime's ProcessExit handler to flush PGO data. TEMPORARY: revert once the snapshot disposal hang is investigated.

…PORARY)" This reverts commit 1d37afb.

…gression Pettis-Hansen was a no-op (no CallWeights) during the benchmark run, so the regression must be from TC delay, cross-module inlining, or warmup changes. Remove TC_CallCountingDelayMs=30 (reverts to default 100ms) to test if this was the cause.

Pettis-Hansen uses an undirected call graph, losing caller-callee directionality. CallFrequency preserves direction (places callees after callers), which the Facebook hfsort paper showed gives 2x better IPC improvement than PH. New PgoTrim subcommand: generate-callchain - Reads .callgraph (IP pairs) + .etlx (method map) - Resolves IPs to method names via binary search on MethodMemoryMap - Outputs CallChainProfile JSON for crossgen2 --callchain-profile - Also writes .sizes file (method name, native size, exclusive samples) for potential CDS (Cache-Directed Sort) implementation Directory.Build.targets: - Uses --method-layout:callfrequency when callchain JSON exists - Falls back to --method-layout:pettishansen when only .mibc available Workflow: - Generates callchain JSON in PgoTrim step - Uploads/downloads/commits as additional PGO artifact

crossgen2's --hot-cold-splitting flag (CORJIT_FLAG_PROCSPLIT) tells the JIT to split R2R method bodies into hot and cold sections during AOT compilation. Cold basic blocks (error paths, exception handlers, rare branches) are moved to a separate .text.cold section. This uses the SPGO block counts from the .mibc (1,361 methods with per-block CPU sample attribution) to identify which blocks are cold. The result is a smaller hot code working set with better I-cache density - the .NET equivalent of BOLT's basic block reordering.

…t build time Store as .gz in repo to reduce commit size. MSBuild target decompresses before Publish so crossgen2 can read the JSON. Same pattern as .jit.gz.

…d-splitting

…uild target MSBuild PropertyGroup Exists() conditions are evaluated at load time, before any targets run. The DecompressCallChainProfile target ran BeforeTargets="Publish" which is too late - the Crossgen2ExtraCommandLineArgs was already set to pettishansen by the time the .json was decompressed. Fix: decompress in the Dockerfile RUN step before dotnet publish, so the .json exists when MSBuild evaluates the PropertyGroup conditions. Remove the MSBuild target since it's no longer needed.

…eouts

…ions CI

…oolchain

…ry.Build.targets PublishReadyToRunComposite and OptimizationPreference=Speed were only in Runner.csproj. Moving them to Directory.Build.targets ensures any project published with R2R (including BDN benchmarks using the R2R toolchain) gets the same settings as production.

…I timeouts

github-actions · 2026-03-24T03:08:33Z

Block Processing Benchmark Comparison

Run: View workflow run
Base: 5a6c7795 | Head: 2ffe6b51

Method	Base (us)	PR (us)	Delta	Base CV	PR CV	Alloc Base	Alloc PR	Alloc Delta
AccessList_50	761.4	754.9	-0.9%	1.0%	2.8%	73.8 KB	73.7 KB	-0.1%
ContractCall_200	1,829.4	1,810.2	-1.0%	1.6%	0.5%	367.1 KB	367.2 KB	+0.0%
ContractDeploy_10	555.8	567.9	+2.2%	2.9%	0.7%	54.1 KB	51.8 KB	-4.2%
Eip1559_200	1,808.0	1,793.7	-0.8%	1.6%	1.3%	350.4 KB	350.1 KB	-0.1%
EmptyBlock	24.3	50.7	+108.6% 🔼	15.0%	71.3%	7.0 KB	7.0 KB	+0.0%
MixedBlock	1,858.6	1,837.9	-1.1%	1.6%	3.2%	357.1 KB	357.2 KB	+0.0%
SingleTransfer	79.0	114.7	+45.1% 🔼	2.1%	36.0%	18.6 KB	18.6 KB	+0.0%
Transfers_200	1,832.8	1,806.8	-1.4%	1.3%	1.3%	350.1 KB	350.3 KB	+0.1%
Transfers_50	775.9	766.1	-1.3%	1.6%	1.5%	65.2 KB	65.3 KB	+0.2%

Detailed statistics

Method	Metric	Base	PR	Delta
AccessList_50	Mean	761.4 us	754.9 us	-0.9%
AccessList_50	Median	760.8 us	758.1 us	-0.3%
AccessList_50	P90	769.0 us	779.4 us	+1.4%
AccessList_50	P95	770.7 us	784.0 us	+1.7%
AccessList_50	Min	747.6 us	726.0 us	-2.9%
AccessList_50	Max	772.4 us	788.5 us	+2.1%
AccessList_50	StdDev	7.4 us	20.9 us	+184.0%
ContractCall_200	Mean	1,829.4 us	1,810.2 us	-1.0%
ContractCall_200	Median	1,827.1 us	1,809.1 us	-1.0%
ContractCall_200	P90	1,857.0 us	1,819.9 us	-2.0%
ContractCall_200	P95	1,875.3 us	1,822.9 us	-2.8%
ContractCall_200	Min	1,799.1 us	1,800.0 us	+0.1%
ContractCall_200	Max	1,893.5 us	1,826.0 us	-3.6%
ContractCall_200	StdDev	28.5 us	9.2 us	-67.6%
ContractDeploy_10	Mean	555.8 us	567.9 us	+2.2%
ContractDeploy_10	Median	555.5 us	568.5 us	+2.3%
ContractDeploy_10	P90	576.6 us	572.5 us	-0.7%
ContractDeploy_10	P95	579.5 us	572.7 us	-1.2%
ContractDeploy_10	Min	530.7 us	561.7 us	+5.8%
ContractDeploy_10	Max	582.3 us	572.9 us	-1.6%
ContractDeploy_10	StdDev	15.9 us	4.3 us	-73.3%
Eip1559_200	Mean	1,808.0 us	1,793.7 us	-0.8%
Eip1559_200	Median	1,810.5 us	1,798.1 us	-0.7%
Eip1559_200	P90	1,836.1 us	1,816.5 us	-1.1%
Eip1559_200	P95	1,840.5 us	1,827.6 us	-0.7%
Eip1559_200	Min	1,757.0 us	1,760.6 us	+0.2%
Eip1559_200	Max	1,844.9 us	1,838.8 us	-0.3%
Eip1559_200	StdDev	28.8 us	24.2 us	-16.1%
EmptyBlock	Mean	24.3 us	50.7 us	+108.6%
EmptyBlock	Median	25.1 us	22.8 us	-9.0%
EmptyBlock	P90	27.6 us	88.8 us	+221.1%
EmptyBlock	P95	27.7 us	89.3 us	+222.8%
EmptyBlock	Min	17.0 us	16.7 us	-2.1%
EmptyBlock	Max	27.7 us	89.9 us	+224.5%
EmptyBlock	StdDev	3.7 us	36.2 us	+890.3%
MixedBlock	Mean	1,858.6 us	1,837.9 us	-1.1%
MixedBlock	Median	1,857.5 us	1,845.4 us	-0.6%
MixedBlock	P90	1,885.2 us	1,909.7 us	+1.3%
MixedBlock	P95	1,893.6 us	1,911.3 us	+0.9%
MixedBlock	Min	1,796.2 us	1,732.5 us	-3.5%
MixedBlock	Max	1,902.1 us	1,912.9 us	+0.6%
MixedBlock	StdDev	29.6 us	58.2 us	+96.4%
SingleTransfer	Mean	79.0 us	114.7 us	+45.1%
SingleTransfer	Median	78.6 us	88.3 us	+12.4%
SingleTransfer	P90	80.6 us	165.0 us	+104.8%
SingleTransfer	P95	81.5 us	167.6 us	+105.8%
SingleTransfer	Min	76.5 us	76.2 us	-0.4%
SingleTransfer	Max	82.3 us	170.2 us	+106.7%
SingleTransfer	StdDev	1.6 us	41.3 us	+2423.8%
Transfers_200	Mean	1,832.8 us	1,806.8 us	-1.4%
Transfers_200	Median	1,832.2 us	1,802.1 us	-1.6%
Transfers_200	P90	1,859.3 us	1,827.5 us	-1.7%
Transfers_200	P95	1,862.0 us	1,841.0 us	-1.1%
Transfers_200	Min	1,788.7 us	1,780.8 us	-0.4%
Transfers_200	Max	1,864.7 us	1,854.5 us	-0.5%
Transfers_200	StdDev	23.7 us	22.7 us	-4.3%
Transfers_50	Mean	775.9 us	766.1 us	-1.3%
Transfers_50	Median	774.4 us	766.5 us	-1.0%
Transfers_50	P90	790.8 us	777.2 us	-1.7%
Transfers_50	P95	794.0 us	781.9 us	-1.5%
Transfers_50	Min	758.1 us	752.9 us	-0.7%
Transfers_50	Max	797.2 us	786.5 us	-1.3%
Transfers_50	StdDev	12.3 us	11.3 us	-8.4%

github-actions · 2026-03-24T03:09:11Z

EXPB Benchmark Comparison

Run: View workflow run

superblocks

Scenario: nethermind-flat-superblocks-pgo-2-delay0s

Metric	PR	Master (cached)	Delta PR vs Master
AVG (ms)	1075.658200	1001.388100	+7.42%
MEDIAN (ms)	949.400000	873.505000	+8.69%
P90 (ms)	1583.12	1532.22	+3.32%
P95 (ms)	1717.53	1867.30	-8.02%
P99 (ms)	3090.95	2386.11	+29.54%
MIN (ms)	650.93	663.77	-1.93%
MAX (ms)	3409.33	2940.23	+15.95%

realblocks

Scenario: nethermind-flat-realblocks-pgo-2-delay0s

Metric	PR	Master (cached)	Delta PR vs Master
AVG (ms)	28.491386	26.483828	+7.58%
MEDIAN (ms)	23.970000	22.445000	+6.79%
P90 (ms)	42.27	39.40	+7.28%
P95 (ms)	52.78	49.35	+6.95%
P99 (ms)	118.11	112.14	+5.32%
MIN (ms)	0.84	1.21	-30.58%
MAX (ms)	3357.05	1269.48	+164.44%

benaadams requested a review from rubo as a code owner March 19, 2026 17:19

benaadams requested review from a team and Copilot March 19, 2026 17:19

github-actions bot added devops performance is good labels Mar 19, 2026

Copilot started reviewing on behalf of benaadams March 19, 2026 17:21 View session

Copilot AI reviewed Mar 19, 2026

View reviewed changes

.github/workflows/collect-pgo-profile.yml Outdated Show resolved Hide resolved

.github/workflows/collect-pgo-profile.yml Outdated Show resolved Hide resolved

benaadams requested review from LukaszRozmej and flcl42 as code owners March 19, 2026 17:40

benaadams changed the title ~~perf: keep all PGO methods including low-count GDV data~~ perf: keep all PGO data and enable profile-driven inlining Mar 19, 2026

benaadams requested review from MarekM25 and kamilchodola March 19, 2026 17:46

benaadams requested a review from Copilot March 19, 2026 17:50

This comment was marked as outdated.

Sign in to view

Copilot started reviewing on behalf of benaadams March 19, 2026 17:51 View session

This comment was marked as outdated.

Sign in to view

MarekM25 approved these changes Mar 19, 2026

View reviewed changes

kamilchodola approved these changes Mar 19, 2026

View reviewed changes

benaadams added the reproducible-benchmark label Mar 19, 2026

LukaszRozmej approved these changes Mar 19, 2026

View reviewed changes

benaadams force-pushed the pgo-2 branch from aa2c744 to 0148603 Compare March 19, 2026 18:38

benaadams changed the title ~~perf: keep all PGO data and enable profile-driven inlining~~ perf: maximize PGO impact and enable cross-module R2R inlining Mar 19, 2026

NethermindEth deleted a comment from github-actions bot Mar 19, 2026

benaadams added reproducible-benchmark and removed reproducible-benchmark labels Mar 19, 2026

NethermindEth deleted a comment from github-actions bot Mar 19, 2026

benaadams changed the title ~~perf: maximize PGO impact and enable cross-module R2R inlining~~ perf: maximize PGO impact, cross-module inlining, and profile-guided method layout Mar 19, 2026

benaadams and others added 3 commits March 22, 2026 23:29

chore(pgo): update PGO profile

3488207

Revert "fix(runner): time-box DB disposal to 15s during shutdown (TEM…

7906e26

…PORARY)" This reverts commit 1d37afb.

NethermindEth deleted a comment from github-actions bot Mar 23, 2026

benaadams added reproducible-benchmark and removed reproducible-benchmark labels Mar 23, 2026

benaadams changed the title ~~perf: full PGO data, cross-module inlining, Pettis-Hansen method layout~~ perf: full PGO pipeline - SPGO, CallFrequency layout, hot-cold splitting, cross-module inlining Mar 23, 2026

benaadams added 4 commits March 23, 2026 02:43

chore(pgo): compress callchain JSON (1.9MB -> 222KB) and decompress a…

d5fbe09

…t build time Store as .gz in repo to reduce commit size. MSBuild target decompresses before Publish so crossgen2 can read the JSON. Same pattern as .jit.gz.

fix(pgo): add targeted R2R verbose output to verify callchain/hot-col…

587e3c5

…d-splitting

benaadams added reproducible-benchmark and removed reproducible-benchmark labels Mar 23, 2026

benaadams and others added 10 commits March 23, 2026 03:02

fix(pgo): use diagnostic verbosity to verify crossgen2 flags (TEMPORARY)

d5a6372

fix: remove diagnostic verbosity from Dockerfile - was causing CI tim…

33590fc

…eouts

fix(pgo): decompress callchain JSON before R2R publish in build-solut…

5215830

…ions CI

fix(pgo): decompress callchain JSON before benchmark builds for R2R t…

321e2c3

…oolchain

fix(pgo): disable R2R for test projects - composite R2R was causing C…

babcc89

…I timeouts

Merge branch 'master' into pgo-2

034afba

Merge branch 'master' into pgo-2

a68ed8a

chore(pgo): update PGO profile

2ffe6b5

benaadams mentioned this pull request Mar 24, 2026

Add MethodDetails CTF event mapping for LTTng trace support microsoft/perfview#2392

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: full PGO pipeline - SPGO, CallFrequency layout, hot-cold splitting, cross-module inlining#10877

perf: full PGO pipeline - SPGO, CallFrequency layout, hot-cold splitting, cross-module inlining#10877
benaadams wants to merge 276 commits intomasterfrom
pgo-2

benaadams commented Mar 19, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

benaadams commented Mar 19, 2026

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

benaadams commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

R2R Compile-Time Optimizations

EVM Opcode Warmup (VirtualMachine.Warmup.cs)

PGO Data Collection (collect-pgo-profile.yml)

SPGO and Call Graph Extraction

Profile Data - .mibc (R2R compile-time)

Profile Data - .callchain.json.gz (R2R method layout)

Profile Data - .jit.gz (Runtime Tier-1 JIT)

Upstream Issues Found & PRs

Other

Type of change

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

benaadams commented Mar 19, 2026

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

Uh oh!

github-actions bot commented Mar 24, 2026

Block Processing Benchmark Comparison

Uh oh!

github-actions bot commented Mar 24, 2026

EXPB Benchmark Comparison

superblocks

realblocks

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

benaadams commented Mar 19, 2026 •

edited

Loading

EVM Opcode Warmup (`VirtualMachine.Warmup.cs`)

PGO Data Collection (`collect-pgo-profile.yml`)

Profile Data - `.mibc` (R2R compile-time)

Profile Data - `.callchain.json.gz` (R2R method layout)

Profile Data - `.jit.gz` (Runtime Tier-1 JIT)