Skip to content

napi-rs v3: ThreadsafeFunction cleanup race during Node env teardown (atexit crash) #1394

@nebasuke

Description

@nebasuke

Symptom

After the napi-rs v3 migration (PR #1385), the edr_napi mocha suite crashes during process teardown on macos-15:

46 passing (...)
SIGBUS / SIGSEGV   (run-to-run alternation)

The crash fires after the last test prints, inside Node's atexit cleanup — well past mocha's success line. EDR #590 reports the same crash on Linux with a stack trace pointing at napi-rs's ThreadsafeFunction finalizer; macOS-15 is just a more visible expression of the same race.

Root cause

The Node bug is nodejs/node#55706: a UAF on ThreadsafeFunction references during env teardown. Fixed only in Node 25 (nodejs/node#55877, commit 350b0ea895); no LTS backport.

The napi-rs v3 migration did not introduce the bug — but it dropped v2's explicit tsfn.unref(env) escape hatch in favor of weak::<true>(), removing the workaround that was masking it. The race becomes visible against any ThreadsafeFunction (TSFN) dropped during atexit, which is every TSFN the EDR provider holds (subscriptions, console-log decode, print-line, coverage, gas-report, call-override, suite-completion).

Fix

Add an explicit async Provider.close() that runs the cleanup cascade — IntervalMiner::DropRuntime::shutdown_background() → TSFN release — on the JS event loop, before atexit runs. Hardhat's network manager (and EDR's mocha suite) call it in afterEach / shutdown.

Background

Full investigation (Node bug timeline, why macOS-15, why EDR's tokio runtime amplifies the race, alternatives considered, and the design of Provider.close()):

edr-provider-close-investigation.md (on the fix branch)

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Bug.

    Projects

    Status
    Inbox

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions