Skip to content

feat: add 2-pass release build for Dynamic Dispatch table#119

Open
bdero wants to merge 22 commits intoshorebird/devfrom
bdero/cascade-limiter
Open

feat: add 2-pass release build for Dynamic Dispatch table#119
bdero wants to merge 22 commits intoshorebird/devfrom
bdero/cascade-limiter

Conversation

@bdero
Copy link
Copy Markdown
Member

@bdero bdero commented Mar 31, 2026

Summary

  • Adds a 2-pass release build for arm64 Apple platforms to compute the Dynamic Dispatch (DD) table
  • Pass 1 produces a temporary ELF snapshot for analyze_snapshot to compute DD table manifest, caller links, and slot mapping
  • Also exports a DD function identity side file (--print_dd_function_identity_to) mapping each Code object's InstructionsTable index to its Function's kernel_offset, which is passed to analyze_snapshot --compute_dd_slot_mapping via --dd_function_identity for kernel_offset-based function matching
  • Pass 2 produces the final assembly snapshot with indirect calls wired up via --dd_slot_mapping
  • DD table files (App.dd.link, App.dd_callers.link) are copied into the supplement directory alongside existing link files

Test plan

  • Flutter Tools Tests pass on Ubuntu and macOS
  • Shard Runner Tests pass
  • Android Tests pass

@bdero bdero force-pushed the bdero/cascade-limiter branch 6 times, most recently from bcc42c8 to 5572ff2 Compare April 3, 2026 11:04
@bdero bdero force-pushed the bdero/cascade-limiter branch from 064ab50 to d13559f Compare April 28, 2026 19:41
@bdero bdero requested a review from eseidel May 2, 2026 00:27
@bdero bdero marked this pull request as ready for review May 2, 2026 00:27
@bdero bdero force-pushed the bdero/cascade-limiter branch from 760f8cf to 0eb78de Compare May 2, 2026 00:29
Comment thread engine/src/flutter/lib/snapshot/BUILD.gn Outdated
Comment thread packages/flutter_tools/lib/src/base/build.dart Outdated
Comment thread packages/flutter_tools/lib/src/base/build.dart Outdated
Comment thread packages/flutter_tools/lib/src/base/build.dart Outdated
}

// Run analyze_snapshot to compute DD table + caller links.
final analyzeSnapshotPath = _genSnapshot.getAnalyzeSnapshotPath(snapshotType, darwinArch);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So historically, all of the analyzed snapshot usage was done down in AOT tools. Are you sure you want to pull this up into the Flutter tool?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, they're kind of all one and the same because they're version locked between them, so I'm not sure that it matters a ton. Maybe this is the right place to put it?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think part of what made me suspect is that there's clearly a gen snapshot invoker at this level, but there isn't an analyze snapshot invoker. down at AOT tools there's both.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bdero's review bot: Going with "leave it here for now". The call site needs to feed flags directly into the very next gen_snapshot invocation in this same build() method, so a hand-off through aot_tools would mean shipping the args back up anyway. Worth a follow-up to mirror GenSnapshot with a proper AnalyzeSnapshot invoker class — the current getAnalyzeSnapshotPath getter on GenSnapshot is a code smell and the natural home for that class — but I'd prefer that as a focused refactor rather than expanding this PR.

Comment thread packages/flutter_tools/lib/src/base/build.dart Outdated
bdero added 15 commits May 6, 2026 17:52
When building for arm64 Apple platforms with the linker enabled, run
gen_snapshot twice: first in ELF mode to produce a temporary snapshot
for analyze_snapshot to compute the DD table manifest, caller links,
and slot mapping, then in assembly mode with --dd_slot_mapping to
produce the final snapshot with indirect calls wired up.

The DD table files (App.dd.link, App.dd_callers.link) are copied into
the shorebird supplement directory alongside the existing link files
so they can be bundled with releases and used during patch builds.
The 2-pass DD table build runs gen_snapshot in ELF mode before the
main assembly pass. Update tests to expect this additional command.
The arm64 DD analysis pass delays the arm64 assembly, so x86_64
(which skips DD) completes its build first when both run concurrently
via Future.wait.
Move the analyze_snapshot existence check before the gen_snapshot ELF
pass so the entire DD computation is a no-op on standard Flutter SDKs
that don't ship analyze_snapshot. This fixes iOS smoke test failures
where gen_snapshot was being invoked unnecessarily in ELF mode.

Also reverts test changes that are no longer needed since DD commands
won't appear when analyze_snapshot doesn't exist in the test filesystem.
The arm64 build's async _computeDDTable() check (even a no-op when
analyze_snapshot is absent) introduces an await that lets x86_64
reach gen_snapshot first in Future.wait. Reorder test expectations
to match the actual interleaving: x86_64 before arm64 at each step.
The DD slot mapping now uses kernel_offset-based function matching
(instead of function names). The base build must export an identity
side file during gen_snapshot pass 1 and pass it to
analyze_snapshot --compute_dd_slot_mapping.

Without this, the DDSlotMapping has empty kernel_offset_to_slot and
FinalizeIndirectStaticCallTable can't assign any DD slots, resulting
in an empty DD table in the base snapshot.
Read SHOREBIRD_DD_MAX_BYTES from the environment to allow overriding
the cascade limiter threshold. Defaults to 10000 if not set.

An environment variable is used (rather than a command-line flag) so
that older Flutter builds without DD table support silently ignore it.
Updates dart_sdk_revision to include SIMARM64 simulator fixes
(DoRedirectedCall, ClobberVolatileRegisters, Execute reason param)
needed for ios_debug engine builds.
Snapshot of work in progress on the base build's DD pipeline:
- debug print in AOTSnapshotter.build for usesDDTable diagnosis
- BUILD.gn change to place analyze_snapshot in universal/ next to
  gen_snapshot so flutter_tools can resolve it by path substitution

Committing to a clean base before restructuring the pipeline to
support the pre-DD optimized pass added in the dart-sdk patch flow.
When shorebird_flutter ships with the DD table 2-pass release build
enabled but the underlying engine's gen_snapshot binary predates the
DD table work (e.g. a user has downgraded their engine cache, or is
running against a Shorebird release from before DD landed),
gen_snapshot hard-errors on the ELF pass with "Setting VM flags
failed: Unrecognized flags: print_dd_function_identity_to" and the
entire release build fails. Older engines simply don't know about
`--print_dd_function_identity_to`, `--dd_slot_mapping`, or any of the
DD table flag family.

Add a capability probe in _computeDDTable that runs gen_snapshot once
with the DD flag plus a bogus kernel input. Two possible failure
modes distinguish support:

  - Flag recognized → flag parsing passes, kernel load fails:
    "Can't load Kernel binary: File size is too small to be a valid
    kernel file."

  - Flag not recognized → VM init fails at flag parsing:
    "Setting VM flags failed: Unrecognized flags:
    print_dd_function_identity_to"

If stderr contains the "Unrecognized flags" token, skip the entire
DD pipeline and fall back to a plain single-pass Shorebird release
build. The cascade-limiter linker (in aot_tools) independently
handles the no-DD case by falling back to the CT pass's op.link for
final pass OP alignment (see dart-sdk commit 139dd8c864d), so the
patch side of the pipeline keeps working too.

Probe strategies that don't work:

  - `gen_snapshot --help` doesn't list individual flags.
  - `gen_snapshot --print_flags` exits early on "At least one input
    is required" before dumping any flag info.
  - Passing the flag without a snapshot kind and kernel silently
    ignores it regardless of support.

The flag+bogus-kernel+snapshot-kind combination is the only
invocation that reliably distinguishes recognized-but-failed-later
from outright-rejected, on both DD-aware and pre-DD engines.

Result is cached per gen_snapshot path so multi-arch release builds
don't pay the probe cost more than once.
When SHOREBIRD_DD_MAX_BYTES is set, AOTSnapshotter now performs a
2-pass build:

1. Pass 1: gen_snapshot produces an ELF for analysis + DD identity file
2. analyze_snapshot computes DD table + caller links + slot mapping
3. Pass 2: gen_snapshot rebuilds with --dd_slot_mapping for DD-enabled code

This produces DD-aware release snapshots where high-fanout cascade
functions are routed through the indirect static call table, enabling
the cascade limiter's link percentage benefit.

Also adds DD supplement files (App.dd.link, App.dd_callers.link,
App.dd_identity.link, App.dd_slots.link) to the LinkSupplement copy
list so they're propagated to the Shorebird CLI's supplement directory.
- build_test.dart: GenSnapshot now requires a `fileSystem` parameter (added by
  this branch for the analyze_snapshot probe path); pass `MemoryFileSystem.test()`
  to the test constructor.
- macos_test.dart: the DD pass is gated on SHOREBIRD_DD_MAX_BYTES, which isn't
  set in the test environment, so neither arm64 nor x86_64 has an extra async
  hop before gen_snapshot. With concurrent Future.wait the archs now reach
  gen_snapshot in iteration order (arm64 first, x86_64 second). Update the
  expected command sequence and the explanatory comment to match.
…raction)

Three changes from Eric's review:

1. Drop the unused `reportTimings` field on AOTSnapshotter — added in
   81304a9 but never read anywhere.
2. Replace the analyze_snapshot fallback iteration with explicit
   arch-based naming, mirroring how GenSnapshot.run picks its binary.
3. Extract the inlined DD 2-pass logic out of build() into
   _runDdAnalysisPass and _readDdMaxBytes, and collapse the five
   identical path joins into a local linkPath() helper.
Reverts the snapshot BUILD.gn change that moved the lipo'd
analyze_snapshot binary into ${root_out_dir}/universal/. That move
broke create_ios_framework.py:96, which still reads analyze_snapshot
from the build-dir root, and was only needed to make the
flutter_tools getAnalyzeSnapshotPath helper find the binary in
local-engine mode (where gen_snapshot resolves via .../universal/...).

Instead, getAnalyzeSnapshotPath now probes both dirname(genSnapshot)
and dirname(genSnapshot)/.., which finds:
  * cached SDK layout: analyze_snapshot alongside gen_snapshot (probe 1)
  * local-engine layout: analyze_snapshot one level up at the build-dir
    root (probe 2)

Keeps engine packaging untouched and avoids coupling this PR to the
Shorebird CLI's view of the published artifact layout.
@bdero bdero force-pushed the bdero/cascade-limiter branch from 5ba9994 to 6f4f270 Compare May 7, 2026 00:54
bdero added 7 commits May 6, 2026 18:08
Upstream commit be89e40 (#138) split the updater C API into
separate engine and dart cbindgen configs (cbindgen_engine.toml /
cbindgen_dart.toml) and removed the combined cbindgen.toml, but the
companion shell/common/shorebird/BUILD.gn change to update the
declared inputs was not landed. Without this fix, the iOS engine
build fails with:
  ninja: error: '../../flutter/third_party/updater/library/cbindgen.toml',
  needed by 'gen/flutter/shell/common/shorebird/rust_updater_<...>.stamp',
  missing and no known rule to make it

Worth landing upstream as a follow-up on #138.
…oudly

Two related hardenings to prevent silent DD-disable regressions like the
one we just chased on Wonderous (link percentage cratered from 90% to
22.6% because a stale universal/analyze_snapshot binary from a prior
BUILD.gn revision was selected, analyze_snapshot rejected gen_snapshot's
output with "Wrong full snapshot version", and the DD pass swallowed the
failure):

1. getAnalyzeSnapshotPath now verifies the candidate analyze_snapshot's
   --sdk_version output matches gen_snapshot's --version output. The
   strings include the build timestamp, so two binaries from the same
   build agree exactly. A stale binary left behind by a previous build
   configuration prints a different version line and is skipped. If the
   version probe itself fails (e.g. binary won't run), fall back to the
   exists-only check rather than fail-closed.

2. _runDdAnalysisPass now checks exit codes from both
   analyze_snapshot --compute_dd_table and --compute_dd_slot_mapping
   invocations, and from the post-conditions on the produced files,
   surfacing each failure with a concrete printError + non-zero return.
   Previously the calls used await without an exit-code check, so any
   failure (snapshot version skew, missing input, OOM, etc.) silently
   left App.dd.link unwritten and the build proceeded to gen_snapshot
   pass 2 with no --dd_slot_mapping. The release would publish without
   DD activation; subsequent patches would fall back to on-the-fly DD
   computation against a no-DD base and produce structurally divergent
   patches with devastating link percentages.

Also prefer aborting early when getAnalyzeSnapshotPath returns null
(versus silently shipping no DD), with a message that explains the
patching consequence.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants