Skip to content

fix(BUILD): restore -exported_symbols_list for :envoy on Darwin#227

Open
desimone wants to merge 5 commits intomainfrom
bdd/eng-3965-fix
Open

fix(BUILD): restore -exported_symbols_list for :envoy on Darwin#227
desimone wants to merge 5 commits intomainfrom
bdd/eng-3965-fix

Conversation

@desimone
Copy link
Copy Markdown

@desimone desimone commented Apr 29, 2026

Summary

Restore Envoy's Darwin exported-symbols list for the :envoy target.

The :envoy target provides custom linkopts, which means Envoy's default _envoy_linkopts() are not applied by envoy_cc_binary. On macOS those defaults include:

-Wl,-exported_symbols_list,$(location @envoy//bazel:exported_symbols_apple.txt)

which keeps the executable's public symbol surface limited to the dynamic-module ABI.

Without that flag the Darwin :envoy binary exported many internal C++ and allocator symbols. On macOS arm64 this allowed dyld symbol binding/interposition that routed C++ runtime cleanup through Envoy's allocator path, reproducing as a SIGSEGV in tcmalloc::CentralFreeList::ReleaseToSpans.

This PR adds the Darwin export-list linkopt back to :envoy and adds a macOS regression test that fails if tcmalloc:: internals or global operator new/operator delete are exported again.

Why

envoy_cc_binary only applies its default link options when linkopts is unset:

if not linkopts:
    linkopts = _envoy_linkopts()

A select(...) value is still a provided linkopts value, so the defaults are skipped even when the Darwin branch only needs a small platform-specific override.

The export list file is already provided to the target through Envoy's existing linker inputs, so this change restores the missing Darwin flag without touching the Linux link options.

Validation

Local Darwin arm64 validation against the produced artifact:

  • envoy --version exits 0
  • envoy --help exits 0
  • Running with no args exits normally with config-validation output, not SIGSEGV
  • envoy --mode validate against a representative HTTP-proxy config exits 0
  • Full server startup (listener + filter chain + cluster init + worker spawn + dispatch loop) shuts down cleanly on SIGTERM
  • No new macOS .ips crash report produced
  • nm -gU | c++filt shows no globally-exported tcmalloc:: or operator new/operator delete symbols
  • Added tools/check_macos_tcmalloc_symbols.sh regression check passes
  • Pomerium runtime smoke (Pomerium spawns this Envoy, full xDS handshake and dynamic config) passes

CI: pending rerun on the latest commit.

Notes

This does not change the Linux link options. Linux still keeps the dynamic-loader flags added for extension loading.

This is separate from broader work to avoid pulling tcmalloc into macOS builds. This PR fixes the Darwin :envoy symbol-visibility regression directly.


Drafted with AI assistance.

PR #210 (extension-loader) gave the :envoy target an explicit linkopts
attribute. Upstream envoy's envoy_cc_binary macro replaces the entire
default linkopt set when the caller provides linkopts:

    if not linkopts:
        linkopts = _envoy_linkopts()

A select() is truthy in Starlark, so the default _envoy_linkopts() is
skipped whenever a select-based linkopts is provided. _envoy_linkopts()
ends with `+ envoy_select_exported_symbols(["-Wl,-E"])`, which on Apple
expands to `-Wl,-exported_symbols_list,exported_symbols_apple.txt`.
That file scopes exports to the dynamic-module ABI only (lua, envoyGo,
dynamic_module callbacks). Without it, every symbol defaults to globally
visible -- tcmalloc's internal Static, ThreadCache, STLPageHeapAllocator,
SlowTLS, and TestingPortalImpl data symbols flip from local (d/s in nm)
to global (D/S), breaking allocator bookkeeping at runtime and producing
SIGSEGV inside tcmalloc::CentralFreeList::ReleaseToSpans during static-
destructor cleanup on macOS arm64.

Same fault as ENG-3955 (envoy --version segfault) and ENG-3965 (envoy
crash on startup on macOS) -- both report identical fault sites at +68
with NULL deref of [NULL+0x20].

The :envoy.static target (also added in PR #210) is unaffected because
it doesn't override linkopts, so it picks up _envoy_linkopts() and its
-exported_symbols_list flag. Bisect confirms :envoy.static at the same
crash commit (533dfa2) exits 0 on --version. This change brings
:envoy back into parity by adding the dropped flag back. The linker
input file (exported_symbols_apple.txt) is already provided
unconditionally by envoy_cc_binary via additional_linker_inputs, so the
$(location ...) reference resolves without further changes.

Adds a Darwin-only sh_test (tools/check_macos_tcmalloc_symbols.sh) that
inspects the built binary with `nm -gU` and fails if any tcmalloc::*
symbols leak into the global export set, so this regression cannot
silently return.

Refs: ENG-3955, ENG-3965
@coveralls
Copy link
Copy Markdown

coveralls commented Apr 29, 2026

Coverage Report for CI Build 25137183191

Warning

Build has drifted: This PR's base is out of sync with its target branch, so coverage data may include unrelated changes.
Quick fix: rebase this PR. Learn more →

Coverage remained the same at 100.0%

Details

  • Coverage remained the same as the base build.
  • Patch coverage: No coverable lines changed in this PR.
  • No coverage regressions found.

Uncovered Changes

No uncovered changes found.

Coverage Regressions

No coverage regressions found.


Coverage Stats

Coverage Status
Relevant Lines: 7542
Covered Lines: 7542
Line Coverage: 100.0%
Coverage Strength: 20589.23 hits per line

💛 - Coveralls

Drop the internal tracking reference from the user-facing message and
extend the symbol scan to also flag globally-exported `operator new`
and `operator delete` overrides, so the next regression in this class
is caught even if the leaked symbols aren't in the `tcmalloc::`
namespace.
The previous regex did not match c++filt output like
`operator new(unsigned long)` because the next char after `new`/`delete`
is `(`. Use `(\[\])?\(` to match `operator new(`, `operator new[](`,
`operator delete(`, and `operator delete[](`.

Also broaden the failure heading to "allocator symbols" since the check
now covers more than tcmalloc:: internals.
@desimone desimone marked this pull request as ready for review April 29, 2026 23:18
@desimone desimone requested review from kenjenkins and kralicky April 29, 2026 23:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants