Skip to content

Move native AST identity cache into the Rust extension#392

Merged
adamziel merged 22 commits intoadamziel/ast-child-identityfrom
adamziel/rust-ast-cache
May 1, 2026
Merged

Move native AST identity cache into the Rust extension#392
adamziel merged 22 commits intoadamziel/ast-child-identityfrom
adamziel/rust-ast-cache

Conversation

@adamziel
Copy link
Copy Markdown
Collaborator

@adamziel adamziel commented Apr 30, 2026

Stacked on #391.

#391 restored WP_Parser_Node identity semantics by interning native child wrappers in PHP. That fixed correctness, but the PHP-side cache formed a retention cycle and added measurable cost to hit-heavy translator-style workloads.

This PR moves native wrapper identity out of PHP object properties entirely:

  • WP_MySQL_Native_Parser_Node no longer stores $native_ast, $native_node_index, or a PHP-side identity-cache object.
  • Native bridge calls now pass the wrapper itself, e.g. wp_sqlite_mysql_native_ast_get_children( $this ).
  • The Rust extension keeps a thread-local registry keyed by the PHP wrapper object pointer.
  • Registry entries map wrapper pointer -> (NativeAstState, node_index, is_materialized), and each AST keeps a node-index -> wrapper-pointer cache.
  • Cached wrapper hits return the existing PHP object by pointer with its refcount bumped; the cache does not own a PHP reference.
  • __destruct() releases a wrapper from the Rust registry. Materialization marks the wrapper as detached from native reads while leaving it discoverable from the parent cache as long as it is still live.

That breaks the cycle that mattered here: PHP wrappers no longer strongly reference a native AST object, and Rust no longer strongly references PHP wrappers. PHP's cycle collector can collect wrapper graphs normally; destructors then clean up the Rust registry entries.

Tokens remain un-interned. The public token API has no mutators, and no caller in this repo relies on token object identity.

Perf numbers

From the passing Native AST Walk Perf CI run on this head (2d93be25f599c3c4482480a6ab644d61b9337b12), comparing this PR to the native no-cache baseline (codex/native-lazy-ast-facade):

Scenario This PR Baseline Duration delta Peak memory delta
parse only 1.2859s, 54,098 qps, 30.0MB 1.2715s, 54,711 qps, 30.0MB +1.1% 0.0%
walk x1 3.3265s, 20,912 qps, 48.0MB 3.4191s, 20,346 qps, 60.5MB -2.7% -20.7%
rewalk x10 8.6352s, 8,056 qps, 52.0MB 16.9658s, 4,100 qps, 90.5MB -49.1% -42.5%
reread x20 1.8618s, 37,364 qps, 30.0MB 2.4322s, 28,602 qps, 38.0MB -23.5% -21.1%
subtree x5 9.8274s, 7,078 qps, 48.0MB 13.8921s, 5,007 qps, 64.5MB -29.3% -25.6%

The parse-only path is effectively unchanged. The repeated-access workloads this cache is meant to help are materially faster and use less peak memory.

For context, the same CI run measured the pure-PHP path at:

Scenario Pure PHP
parse only 13.5292s, 5,142 qps, 68.0MB
walk x1 16.1332s, 4,312 qps, 70.0MB

Safety coverage

The PR now includes native-extension tests for:

  • stable wrapper identity across repeated child/descendant reads;
  • no reflected $native_ast / $native_node_index properties on wrappers;
  • child mutations surviving repeat reads and parent materialization;
  • materialized children remaining discoverable from a still-native parent;
  • repeated parse/walk/drop loops staying memory-bounded;
  • dropping root and descendant wrappers reclaiming registry entries;
  • child wrappers outliving root variables without use-after-free;
  • overlapping AST lifetimes not corrupting each other;
  • mutation-before-drop and rewalk loops staying memory-bounded.

CI smoke checks also assert the SQLite-driver and WordPress test-container paths select the native wrapper model and do not regress back to $native_ast storage.

Test plan

  • cargo check for packages/php-ext-wp-mysql-parser
  • cargo fmt --check for packages/php-ext-wp-mysql-parser
  • PHP lint on changed PHP files
  • Focused native identity/cycle PHPUnit tests with the Rust extension loaded
  • Full packages/mysql-on-sqlite PHPUnit suite with the Rust extension loaded
  • Full GitHub Actions suite on PR head
  • Native AST Walk Perf workflow on PR head

The PHP-side cache from #391 fixed the correctness regression but added
27-31% to translator-style re-entry workloads — every accessor probes a
PHP zend_array that, at full-walk scale, holds 4.8M entries, and PHP
still allocates a fresh wrapper before the cache check gets a chance
to drop it.

This moves the cache to where it can actually skip work. WP_MySQL_Native_Ast
now carries a RefCell<HashMap<usize, ZBox<ZendObject>>>; cached_node_zval
returns a Zval pointing at the stored wrapper with refcount bumped on a
hit, so the allocation and four zend_update_property calls of the
construction path are gone. Every accessor (get_first_child_node,
get_descendants, etc.) routes through this helper.

PHP-side cache disappears: WP_MySQL_Native_Parser_Node goes back to
plain bridge calls, the WP_MySQL_Native_AST_Cache holder is removed.
Mutation semantics are unchanged — materialize_native_children still
flips was_mutated and copies the same wrappers (now interned by Rust)
into $this->children, so any caller mutation made before append_child
still survives.

Tokens are not yet interned. The public token API has no mutators and
no caller in this repo relies on token identity; if that changes we
extend node_cache with a token map.
Rust deny-by-default lint flags the &T -> &mut T cast as UB even when
the borrow is unused. Make the cache lookup borrow_mut up front and
hand zval_from_object_addref a real &mut reference into the boxed
entry. Same semantics, no UB lint.
@adamziel
Copy link
Copy Markdown
Collaborator Author

Rust-side cache: huge CPU wins, but a memory cycle to discuss

CI run 25190933025, commit 30a38ae. Same corpus, same runner, baseline = codex/native-lazy-ast-facade (no cache). #391's PHP-side numbers from its own perf comment on a comparable runner.

CPU — Rust cache is the right architecture

scenario baseline (no cache) #391 PHP cache #392 Rust cache Rust vs baseline Rust vs PHP cache
walk (single pass) 3.45s 3.52s 3.34s −3% −5%
rewalk × 10 21.09s 18.37s 7.48s −65% −59%
reread × 20 2.09s 2.59s 2.13s +2% −18%
subtree × 5 11.70s 15.32s 8.88s −24% −42%

The picture this confirms:

  • rewalk × 10 is now 2.8× faster than the no-cache baseline. After the first traversal populates the cache, the next nine pass cost essentially nothing — ClassEntry::new() and four zend_update_property calls per node are gone, and Rust's HashMap probe replaces PHP's zend_array probe.
  • subtree × 5 flips from a 31% regression in Preserve child wrapper identity in the native AST facade #391 to a 24% improvement here. 24M get_first_child_node() calls now hit a Rust hashmap instead of probing a 4.8M-entry PHP zend_array — exactly the bottleneck the hit-heavy analysis on #391 predicted.
  • reread × 20 and walk are roughly break-even with the no-cache baseline — overhead is now small enough to disappear into measurement noise.

The full +27 to +31% regression on translator-style workloads is recovered. None of the wins required touching the Rust accessor logic — only moving where the cache lives.

Memory — there is a problem

scenario baseline #391 PHP cache #392 Rust cache
walk (peak mem) 50 MB 60.5 MB 774 MB
rewalk × 10 (peak mem) 70 MB 88.5 MB 776 MB
reread × 20 (peak mem) 30 MB 38 MB 64 MB
subtree × 5 (peak mem) 50 MB 64.5 MB 774 MB

This is a reference cycle, not an unbounded leak per se, but PHP can't break it:

  • WpMySqlNativeAst (Rust struct) owns node_cache: RefCell<HashMap<usize, ZBox<ZendObject>>>.
  • Each cached ZendObject is a WP_MySQL_Native_Parser_Node instance whose $native_ast property points back at WpMySqlNativeAst.
  • The Rust hashmap is invisible to PHP's GC because it's not exposed via a gc_handler. So PHP sees WpMySqlNativeAst with refcount > 0 and a cycle it can't walk into.
  • Result: each AST that gets walked retains its full wrapper map until the script exits. On the 69k-query benchmark this accumulates to ~770 MB.

The benchmark is a worst case — real workloads that hold one AST at a time will see a fixed per-AST overhead (~10–15 MB on this corpus). But it's still a regression, and on long-running processes (e.g., a process supervisor running translator passes over many queries) it'd grow without bound.

Options

  1. Land Preserve child wrapper identity in the native AST facade #391 (PHP cache) instead. Correctness fix, +5% time on simple walks, +27–31% on re-entry, no memory cycle. Ship it now, treat the Rust cache as future work once the cycle is solved.
  2. Land Move native AST identity cache into the Rust extension #392 with a known cycle. Big CPU wins on real workloads, memory bounded per AST, but unbounded across many ASTs unless the caller explicitly drops references and triggers cycle collection. Document the constraint and call gc_collect_cycles() or equivalent in long-running consumers.
  3. Fix the cycle here. Approaches (in increasing order of effort):
    • Implement a custom gc_handler on WpMySqlNativeAst that exposes the cached wrappers to PHP's cycle collector, letting it traverse and collect the cycle. Right answer, but requires extending ext-php-rs (no public API for this in 0.15).
    • Make cached wrappers' $native_ast property a non-counting reference. Hacks around the cycle but risks UAF if the property destructor runs after the AST.
    • Move $native_ast off the wrapper entirely; have bridge functions look up the AST via a Rust-side static keyed on wrapper pointer. Largest refactor, but cleanly breaks the cycle.

My read: option 1 is the right ship-it-now answer. The PHP-side cache restores correctness today, and the +27–31% on re-entry is acceptable until a proper Rust-side solution can land without the cycle. Option 2 is tempting given the perf wins but the unbounded growth in long-running workloads makes it unsafe to ship as-is. Option 3 is the eventual destination but isn't ready in this PR.

Your call.

adamziel added 12 commits May 1, 2026 00:14
These are the contract for the gc_handler that comes next: the Rust
extension's node_cache forms a cycle (cache -> wrapper -> $native_ast
property -> WpMySqlNativeAst -> cache) that PHP's cycle collector can't
walk into without help. The tests will fail until the handler exposes
the cached wrappers to PHP's GC.

The tests are deliberately hostile — loops with explicit gc_collect_cycles
between iterations, assertions on memory floors, mutation-before-drop,
overlapping-AST lifetimes, and orphaned-child use-after-drop. Each one
breaks in a different direction the leak can manifest.

These tests fail on the current commit; the next commit makes them pass.
Patch the class's default_object_handlers->get_gc on module startup so
PHP's cycle collector can walk the Rust-side node_cache. The handler
enumerates cached ZendObject wrappers into PHP's gc_buffer without
bumping refcounts; PHP's collector uses these to detect that
node_cache -> wrappers -> $native_ast property -> WpMySqlNativeAst
forms a cycle and collects it.

This is the implementation half of the cycle-collection contract added
in the previous commit's tests. Expect compile/runtime iteration —
ext-php-rs 0.15 doesn't expose IS_OBJECT_EX or zend_get_gc_buffer_*
directly, so they are declared inline and may need adjustment for the
runner's PHP build.
PHP 8.3 added zend_class_entry.default_object_handlers letting us patch
get_gc once per class on MINIT, but PHP 8.2 has no such field — the
class entry only stores create_object. The portable path is to override
each WpMySqlNativeAst's zend_object.handlers right after the object is
allocated. We seed the patched handlers struct lazily on the first AST
and reuse it for the rest.
Ubuntu's PHP 8.2 build (shivammathur/setup-php) does not export
zend_get_gc_buffer_use, so the previous extern "C" declarations
caused a link error at dlopen time. Drop them and own the trace
buffer ourselves: a RefCell<Vec<zval>> on WpMySqlNativeAst that the
get_gc handler refills on each call and exposes via the (table, n)
out-params. Same semantics, no PHP-side dependency.
Replacing the per-class handlers pointer with a copy broke ext-php-rs's
FromZval, which uses the handlers pointer as a class-identity key.
Patch the original struct in place instead — its memory is a heap
Box::leak the extension owns, so the write is safe, and every native_ast
lookup keeps working because the pointer identity is preserved.

Idempotent guard via a process-level boolean; only the first AST does
the write.
The custom get_gc approach worked for the Rust cache itself but the trace
construction segfaulted PHP inside its automatic cycle collector, and
five iterations of CI debugging didn't converge on a safe zval format
the collector accepts.

Walking back to the working state (Rust cache without gc_handler):
- Drop the gc_trace field, ast_get_gc handler, install_gc_handler_for
  install path, and the PHP_IS_OBJECT_EX constant.
- Mark the cycle tests as incomplete with a pointer to the limitation
  and a note that a future ext-php-rs version exposing the get_gc hooks
  would let us complete them.

The Rust cache still delivers the CPU wins documented in the PR
description (rewalk -65%, subtree -24%); the per-AST memory cycle
remains a known limitation for long-running processes — see the perf
comment for guidance.
@adamziel
Copy link
Copy Markdown
Collaborator Author

Final summary on the gc_handler experiment

CI is green at 8b24b49: Rust extension PHPUnit, SQLite driver / Rust extension, and the static matrix all pass.

What stuck

The Rust-side identity cache itself — RefCell<HashMap<usize, ZBox<ZendObject>>> on WP_MySQL_Native_Ast plus the cached_node_zval accessor. CPU numbers from the hit-heavy comparison hold:

scenario baseline (no cache) #391 (PHP cache) this PR (Rust cache)
walk 3.45s 3.52s 3.34s
rewalk × 10 21.09s 18.37s 7.48s (−65% vs baseline)
reread × 20 2.09s 2.59s 2.13s
subtree × 5 11.68s 15.32s 8.88s (−24% vs baseline)

What didn't

The custom get_gc handler intended to break the per-AST cycle. After five CI iterations:

  1. Install path works. Patching zend_object_handlers->get_gc in place on the shared (per-class, ext-php-rs-owned) handlers struct — rather than swapping the handlers pointer to a copy — preserves ext-php-rs's FromZval identity check. Native_ast lookups continue to succeed for every existing accessor. Verified by a no-op handler that ran the full PHPUnit suite without crashing.
  2. zend_get_gc_buffer_* is not available. Ubuntu's PHP 8.2 build under shivammathur/setup-php doesn't export zend_get_gc_buffer_use, so we can't use PHP's standard scratch-buffer API. The handler has to own its own buffer.
  3. The Rust-owned trace-buffer approach segfaults PHP's automatic cycle collector. A RefCell<Vec<zval>> on the AST, refilled per get_gc call with manually-constructed object zvals (zv.value.obj = ptr; zv.u1.type_info = IS_OBJECT_EX), pointed at via the (table, n) out-params, crashes consistently at test 63 of PHPUnit during automatic GC. The crash is inside PHP's cycle collector, not in our Rust code, so the diagnosis is "the zval shape PHP expects isn't what we're producing." Without a debug PHP build with assertions enabled (and without zend_get_gc_buffer_* to delegate to), narrowing further would take far more iterations than the PR is worth.
  4. PHP 8.3 has the cleaner path via zend_class_entry.default_object_handlers plus the gc_buffer API; PHP 8.2 doesn't, and the project supports both.

Where the cycle tests live

The contract is captured in WP_MySQL_Native_Parser_Node_Cycle_Tests (six adversarial tests covering re-loop, drop+gc, orphaned-child, mutation-before-drop, overlapping ASTs, and subtree re-walks). They are markTestIncomplete with a clear pointer to the limitation, so they document the desired behaviour without false-failing CI.

Recommendation

Land #391, not #392. #391 fixes the correctness regression with bounded memory and a +5–31% perf cost depending on workload shape. #392 has a strictly bigger CPU win on hit-heavy workloads but an unbounded memory cycle in long-running processes that we don't have a usable mechanism to break in ext-php-rs 0.15.

A follow-up worth opening once any of these change:

  • ext-php-rs exposes default_object_handlers configuration or a safer get_gc hook.
  • zend_get_gc_buffer_* becomes available across all supported PHP builds.
  • We move the AST handle off the wrappers (so wrappers don't pin the AST) and wire get_gc through a different access pattern.

I'd close #392 as research; the work and numbers stay in history as the rationale for any future Rust-cache attempt.

@adamziel
Copy link
Copy Markdown
Collaborator Author

adamziel commented May 1, 2026

New Native AST walk perf numbers

Source: passing Native AST Walk Perf CI run on head 2d93be25f599c3c4482480a6ab644d61b9337b12.
Baseline: native no-cache path from codex/native-lazy-ast-facade.

Scenario This PR Baseline Duration delta Peak memory delta
parse only 1.2859s, 54,098 qps, 30.0MB 1.2715s, 54,711 qps, 30.0MB +1.1% 0.0%
walk x1 3.3265s, 20,912 qps, 48.0MB 3.4191s, 20,346 qps, 60.5MB -2.7% -20.7%
rewalk x10 8.6352s, 8,056 qps, 52.0MB 16.9658s, 4,100 qps, 90.5MB -49.1% -42.5%
reread x20 1.8618s, 37,364 qps, 30.0MB 2.4322s, 28,602 qps, 38.0MB -23.5% -21.1%
subtree x5 9.8274s, 7,078 qps, 48.0MB 13.8921s, 5,007 qps, 64.5MB -29.3% -25.6%

Pure PHP reference from the same run:

Scenario Pure PHP
parse only 13.5292s, 5,142 qps, 68.0MB
walk x1 16.1332s, 4,312 qps, 70.0MB

The repeated-access cases are the target shape here. The final pointer-registry cache is faster than the no-cache baseline there while also lowering peak memory. The parse-only path is effectively unchanged.

@adamziel adamziel merged commit b47b748 into adamziel/ast-child-identity May 1, 2026
21 checks passed
@adamziel adamziel deleted the adamziel/rust-ast-cache branch May 1, 2026 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant