Skip to content

Add native Rust-based MySQL parser extension#381

Merged
adamziel merged 41 commits intotrunkfrom
codex/native-lazy-ast-facade
May 1, 2026
Merged

Add native Rust-based MySQL parser extension#381
adamziel merged 41 commits intotrunkfrom
codex/native-lazy-ast-facade

Conversation

@adamziel
Copy link
Copy Markdown
Collaborator

@adamziel adamziel commented Apr 28, 2026

What it does

Adds an optional Rust PHP extension for the MySQL lexer/parser. When the extension is loaded, the existing public PHP API stays the same, but WP_MySQL_Lexer and WP_MySQL_Parser delegate lexing/parsing to native Rust code.

php -d extension=/path/to/libwp_mysql_parser.so your-script.php
require 'packages/mysql-on-sqlite/src/load.php';

$driver = new WP_PDO_MySQL_On_SQLite( 'mysql-on-sqlite:path=:memory:;dbname=wp;' );
$parser = $driver->create_parser( 'SELECT ID, post_title FROM wp_posts WHERE ID IN (1, 2, 3)' );

$parser->next_query();
$ast = $parser->get_query_ast();

echo $ast->rule_name; // query

Without the extension, the same code uses the existing PHP parser.

Rationale

The PHP parser is correct but expensive on large query sets. On the MySQL corpus, the native path measured against trunk's PHP implementation was:

Scenario Trunk PHP Native extension
Parse only 14.3114s / 4,862 QPS / 68.0MB 1.1574s / 60,105 QPS / 30.0MB
Parse + walk 20.0804s / 3,465 QPS / 70.0MB 2.9751s / 23,383 QPS / 48.0MB

That is about 12.37x faster for parse-only and 6.75x faster for parse+walk. Raw numbers are in #381 (comment).

Implementation

The extension lives in packages/php-ext-wp-mysql-parser/ and exports WP_MySQL_Native_Lexer, WP_MySQL_Native_Token_Stream, WP_MySQL_Native_Parser, and WP_MySQL_Native_Parser_Node.

The SQLite driver selects the native path only when the native lexer class is active:

$lexer = new WP_MySQL_Lexer( $sql );

if ( $lexer instanceof WP_MySQL_Native_Lexer ) {
    $tokens = $lexer->native_token_stream();
} else {
    $tokens = $lexer->remaining_tokens();
}

Native AST nodes are lazy PHP wrappers over a Rust-owned AST. Wrapper identity is stable through a per-AST cache, and Rust state is stored in a Rust-side registry keyed by the PHP wrapper object pointer. That avoids the previous PHP/Rust reference cycle while keeping repeated child reads referentially stable.

The pure-PHP parser remains the fallback and WP_MySQL_Parser remains an instanceof WP_Parser.

Testing instructions

Run the PHP suite normally:

cd packages/mysql-on-sqlite
php ./vendor/bin/phpunit -c ./phpunit.xml.dist

Run it against the extension:

cd packages/php-ext-wp-mysql-parser
cargo build

cd ../mysql-on-sqlite
WP_SQLITE_REQUIRE_NATIVE_PARSER_EXTENSION=1 \
php -d extension=../php-ext-wp-mysql-parser/target/debug/libwp_mysql_parser.so \
  ./vendor/bin/phpunit -c ./phpunit.xml.dist

CI currently passes on bde34d5 for PHP 7.2-8.5, including extension-loaded SQLite integration tests on PHP 8.0-8.5 and WordPress PHPUnit with the extension loaded.

@adamziel adamziel force-pushed the codex/native-lazy-ast-facade branch from bf50f10 to c2da5e4 Compare April 28, 2026 14:22
@adamziel adamziel changed the base branch from codex/native-token-stream-parser to trunk April 28, 2026 14:22
@adamziel adamziel force-pushed the codex/native-lazy-ast-facade branch from c219b31 to 2476729 Compare April 28, 2026 15:11
@adamziel adamziel changed the base branch from trunk to codex/native-parser-php-facade April 28, 2026 15:11
@adamziel adamziel force-pushed the codex/native-lazy-ast-facade branch 2 times, most recently from f06ecf6 to 48db7c5 Compare April 28, 2026 15:22
@adamziel adamziel changed the base branch from codex/native-parser-php-facade to codex/native-parser-node-facade April 28, 2026 15:22
@adamziel adamziel force-pushed the codex/native-lazy-ast-facade branch from 5fc4ca2 to e41fdaf Compare April 29, 2026 09:15
@adamziel adamziel force-pushed the codex/native-parser-node-facade branch from b20499f to 77a45df Compare April 30, 2026 11:52
@adamziel adamziel force-pushed the codex/native-lazy-ast-facade branch from e41fdaf to 9595995 Compare April 30, 2026 11:52
@adamziel adamziel force-pushed the codex/native-parser-node-facade branch from 77a45df to 830a9b2 Compare April 30, 2026 11:59
@adamziel adamziel force-pushed the codex/native-lazy-ast-facade branch from 9595995 to 039eb69 Compare April 30, 2026 12:00
@adamziel adamziel force-pushed the codex/native-parser-node-facade branch from 830a9b2 to e66bab3 Compare April 30, 2026 12:16
@adamziel adamziel force-pushed the codex/native-lazy-ast-facade branch from 039eb69 to 07a7777 Compare April 30, 2026 12:16
@adamziel adamziel changed the base branch from codex/native-parser-node-facade to trunk April 30, 2026 12:22
@adamziel adamziel force-pushed the codex/native-lazy-ast-facade branch from 07a7777 to 9153c2e Compare April 30, 2026 12:24
@adamziel adamziel changed the base branch from trunk to codex/native-parser-node-facade April 30, 2026 12:24
@adamziel adamziel force-pushed the codex/native-parser-node-facade branch from bb3b8e6 to 89bc6a4 Compare April 30, 2026 12:37
@adamziel adamziel force-pushed the codex/native-lazy-ast-facade branch from 9153c2e to d636f96 Compare April 30, 2026 12:37
@adamziel adamziel force-pushed the codex/native-parser-node-facade branch from 89bc6a4 to cd22199 Compare April 30, 2026 12:40
@adamziel adamziel force-pushed the codex/native-lazy-ast-facade branch from d636f96 to 6403031 Compare April 30, 2026 12:40
@adamziel adamziel force-pushed the codex/native-parser-node-facade branch from cd22199 to 23b1c02 Compare April 30, 2026 12:43
@adamziel adamziel force-pushed the codex/native-lazy-ast-facade branch from 6403031 to c8f5b10 Compare April 30, 2026 12:43
When a native parser is in use, expose query results through a node
class that defers child materialization until callers actually walk the
tree. The base WP_Parser_Node::$children visibility is loosened to
protected so the facade can populate it on demand.
When a native parser is in use, expose query results through a node
class that defers child materialization until callers actually walk the
tree. The base WP_Parser_Node::$children visibility is loosened to
protected so the facade can populate it on demand.
@adamziel adamziel marked this pull request as ready for review April 30, 2026 13:40
@adamziel adamziel changed the title [codex] Add lazy native AST facade Add native Rust-based MySQL parser extension Apr 30, 2026
## Summary
- add one explicit `WP_Parser_Grammar::$native_grammar` cache slot
- store the compiled Rust grammar on the PHP grammar object instead of
in a content-hash cache
- remove the full exported-grammar hash walk from native parser
construction

## Why
The previous Rust-only content-key cache preserved a smaller PHP diff,
but every parser construction still exported and recursively hashed the
entire grammar before it could hit cache. In the SQLite smoke benchmark
that dropped the native path back to roughly 2x faster than PHP.

This restores the object-attached cache path we had before, but keeps
the PHP diff explicit and minimal: one new public cache property on
`WP_Parser_Grammar`.

## Measurements
Command:

```bash
TMP_TEST_NATIVE_QUERY_COUNT=250 ./tmp-test-native/run.sh
```

| Run | PHP parser | Rust parser | Speedup |
| ---: | ---: | ---: | ---: |
| 1 | 3.088s | 0.389s | 7.94x |
| 2 | 3.126s | 0.386s | 8.10x |
| 3 | 2.927s | 0.348s | 8.41x |

Default 2000-query smoke workload:

| Workload | PHP parser | Rust parser | Speedup |
| --- | ---: | ---: | ---: |
| 2000 generated queries, including 8 x 2000-row inserts | 24.082s |
3.008s | 8.01x |

## Testing
- `cargo fmt --check`
- `php -l
packages/mysql-on-sqlite/src/parser/class-wp-parser-grammar.php`
- `git diff --check`
- `TMP_TEST_NATIVE_QUERY_COUNT=250 ./tmp-test-native/run.sh`
- `./tmp-test-native/run.sh`

## Notes
This assumes `WP_Parser_Grammar` is immutable after construction for
native parsing purposes. That matches current use, and the tradeoff is
isolated in this PR so it is visible in review.
## Summary
- reuse one `WP_MySQL_Parser` instance inside the SQLite driver and
reset its token stream per query
- add `reset_tokens()` to the PHP parser polyfill and the Rust native
parser
- restore native parser-node accessor fast paths in
`WP_MySQL_Native_Parser_Node`, while keeping PHP child materialization
for mutation
- fix the local native extension build helper for Nix/libclang bindgen
by undefining `__SSE2__` during binding generation

## Stack
This is the top PR in the native MySQL lexer/parser stack. The stack is
split so each GitHub diff shows one reviewable concern:

1. [#384 Extract MySQL lexer and parser
polyfills](#384)
   - `trunk` -> `codex/native-parser-php-facade`
   - extraction-only PHP refactor
- moves the existing PHP lexer/parser implementations into polyfill
classes
- keeps public `WP_MySQL_Lexer` and `WP_MySQL_Parser` as thin PHP
subclasses

2. [#385 Add optional native parser
routing](#385)
- `codex/native-parser-php-facade` ->
`codex/native-parser-class-routing`
   - adds fallback `WP_MySQL_Native_*` PHP classes
- routes the public lexer/parser classes through native classes when the
Rust extension provides them
   - adds the minimal PHP grammar-export bridge for the native parser

3. [#386 Add lazy native parser node
facade](#386)
- `codex/native-parser-class-routing` ->
`codex/native-parser-node-facade`
   - keeps `WP_Parser_Node` as the plain PHP tree node
- adds `WP_MySQL_Native_Parser_Node extends WP_Parser_Node` for
native-backed lazy AST nodes
- keeps native AST handles and native accessor delegation out of the
base node class

4. [#381 Add lazy native AST
facade](#381)
   - `codex/native-parser-node-facade` -> `codex/native-lazy-ast-facade`
- implements the Rust lexer/parser extension and lazy native AST facade
   - makes the Rust extension instantiate `WP_MySQL_Native_Parser_Node`
- adds native-extension CI coverage for the SQLite driver and WordPress
PHPUnit tests
   - includes the local SQLite facade smoke benchmark

5. [#387 Cache native grammar on parser grammar
object](#387)
- `codex/native-lazy-ast-facade` ->
`codex/native-parser-object-grammar-cache`
   - restores the object-attached native grammar cache
   - adds only `WP_Parser_Grammar::$native_grammar` on the PHP side
- removes the Rust content-hash cache that walked the whole exported
grammar on every parser construction

6. This PR, [#388 Speed up native AST
materialization](#388)
- `codex/native-parser-object-grammar-cache` ->
`codex/native-parser-bulk-materialization`
- optimizes native-to-PHP AST access after the grammar-cache performance
restoration
- reuses the SQLite driver's parser instance instead of constructing it
per query

## Why
The native lexer/parser itself is fast, but the PHP-facing path can lose
that benefit if each query repeatedly rebuilds native parser state or
forces full PHP AST materialization. On the current stack, #387 already
removes the large grammar export/hash cost. This PR removes the
remaining per-query parser construction churn and restores the native
AST accessor path for descendant-heavy SQLite driver workloads.

## Measurements
Environment: local PHP 8.2 via the native build helper, release Rust
extension, current top of this PR.

Focused constructor/reset benchmark over 5000 unique SELECT queries:

| Phase | Time |
| --- | ---: |
| native tokenize | 22.62 us/query |
| fresh native parser constructor only | 2.31 us/query |
| reusable parser `reset_tokens()` only | 0.32 us/query |
| reusable parser reset + parse + `get_descendants()` | 157.06 us/query
|
| constructor/reset ratio | 7.3x |

The previously reported ~622 us/query constructor cost does not
reproduce on this stack because #387 already caches the native grammar
on the PHP grammar object. Parser reuse still removes most of the
remaining constructor overhead.

SQLite facade smoke workload:

Command:

```bash
TMP_TEST_NATIVE_QUERY_COUNT=250 ./tmp-test-native/run.sh
```

| Workload | PHP fallback | Native extension | Speedup |
| --- | ---: | ---: | ---: |
| 250 generated queries, including 1 x 2000-row insert | 4.060s | 0.525s
| 7.73x |

## Testing
- `cargo fmt --check`
- `git diff --check`
- `composer run check-cs`
- `composer run test` from `packages/mysql-on-sqlite`
- `php -d
extension=packages/mysql-on-sqlite/ext/wp-mysql-parser/target/release/libwp_mysql_parser.so
packages/mysql-on-sqlite/vendor/bin/phpunit -c
packages/mysql-on-sqlite/phpunit.xml.dist`
- `TMP_TEST_NATIVE_QUERY_COUNT=250 ./tmp-test-native/run.sh`
Comment thread packages/php-ext-wp-mysql-parser/src/lib.rs
Comment thread .github/workflows/mysql-parser-extension-tests.yml
Comment thread .github/workflows/wp-tests-phpunit.yml Outdated
Comment thread .github/workflows/mysql-parser-extension-tests.yml
.ok_or_else(|| php_error("Native AST node index is out of range"))
}

fn child_to_zval(&self, native_ast_zval: &Zval, child: NativeAstChild) -> PhpResult<Zval> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The native AST accessors appear to create fresh PHP wrappers for child nodes/tokens on each read. That changes WP_Parser_Node semantics: child object identity is no longer stable, and mutations to a child returned by get_first_child_node() / get_children() are not visible when traversing from the parent again. Since WP_Parser_Node exposes public mutators and this PR aims to keep the public parser API unchanged, can we either cache/materialize child wrappers consistently or explicitly account for this compatibility change?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#391 and #392 address that. I hope :D I'll get the Ci to run against all the same tests as the PHP driver and scrutinize once that works

The native parser extension constructs a fresh WP_MySQL_Native_Parser_Node
on every accessor call, so two reads of the same logical node returned
different PHP objects and any state a caller attached to the first wrapper
was invisible through the second. WP_Parser_Node has always given callers
stable child identity, and the lazy native facade is meant to keep that
contract intact — this restores it.

A per-AST identity map is created lazily on the root and shared by every
interned wrapper. Each accessor that returns a node looks the index up
in the map and returns the canonical instance, discarding the freshly
constructed one. Materialization pulls children through the same map so
mutations a caller made through get_first_child_node() before triggering
append_child() survive into $this->children.

Adds a regression test that exercises identity across child, descendant,
and post-materialization reads, plus a walk benchmark and a CI workflow
that reports parse + walk time and peak memory for the PHP and native
paths so the cache cost is measurable on every PR.
intern_all hoists the cache lookup out of the loop and inlines what was
a per-item method call to intern(). For accessors whose Rust bridge
returns only nodes — get_child_nodes / get_descendant_nodes — a typed
intern_nodes() variant skips the instanceof check entirely. The walk
benchmark exercises the descendants accessors over ~4.8M nodes per run,
so even small per-item savings add up.
The walk benchmark we already had is cache-miss heavy (one walk per AST,
every node visited once), so the identity cache shows up there as a
small overhead rather than a win. The cache is supposed to pay back in
hit-heavy patterns: re-walks of the same tree, repeated child reads at
the root, and translator-style passes that re-enter visited subtrees.

Adds three modes (--mode=rewalk|reread|subtree) and runs each on both
the PR and the baseline so the comparison is apples-to-apples on the
same runner, same corpus.
Stacked on #381.

#381's review surfaced a real semantics regression in the lazy native
AST facade: every accessor on `WP_MySQL_Native_Parser_Node` calls into
Rust and returns a freshly constructed PHP wrapper, so
`get_first_child_node()` returns a different object every time.
`WP_Parser_Node` has always given callers stable child identity — attach
state to a child once, walk past it, walk back, the state is still there
— and the lazy native facade is meant to keep that contract intact. This
restores it.

A per-AST `WP_MySQL_Native_AST_Cache` is created lazily on the root and
shared by reference with every wrapper that gets interned through it.
Each accessor looks the returned wrapper's `native_node_index` up in the
cache and either returns the canonical instance or registers the new
one. `materialize_native_children()` pulls children through the same
cache so any mutation a caller made through `get_first_child_node()`
before the parent went through `append_child()` survives into
`$this->children` — same instance, same mutations.

Tokens are unchanged. The public token API has no mutators and no
callers in this repo rely on `WP_MySQL_Token` identity; if that becomes
a need we can extend the cache.

## What's in here

- `class-wp-mysql-native-ast-cache.php` — small holder, one per AST.
- Native node accessors run results through `intern()` / `intern_all()`
/ `intern_nodes()`.
- `materialize_native_children()` reuses the interned wrappers so prior
mutations don't get lost.
- Regression test covering same-instance reads, descendant/child
identity sharing, and the mutate-then-materialize edge case. Skips when
the native extension isn't loaded.
- `run-native-ast-walk-benchmark.php` parses the MySQL server suite,
walks each AST, and reports `parsed`, `walked_nodes`, `duration`,
`peak_mem`, and an identity-stability flag.
- `Native AST Walk Perf` workflow runs the benchmark on this PR and on
the PR base (`codex/native-lazy-ast-facade`) on the same runner, so the
identity-cache cost is measured apples-to-apples on every push.

## Performance

Benchmarked on CI against the no-cache baseline, same runner, same
corpus (69,567 queries, 4.8M walked nodes). Full numbers in PR comments
— links below.

| scenario | baseline (no cache) | this PR | delta |
|---|---|---|---|
| native parse only | 1.28s | 1.28s | 0% |
| native walk duration | 3.52s | 3.33s | **+5% (+0.19s)** |
| native walk qps | 19,766 | 20,884 | **−5%** |
| native walk peak memory | 50.0MB | 60.5MB | **+10.5MB (+21%)** |
| native walk identity stable | **FALSE** | true | regression fixed |

Hot-path optimization in `intern_all()` (cache reference hoisted out of
the loop, per-item logic inlined) plus a typed `intern_nodes()` fast
path for accessors that return only nodes brought the time penalty down
from an initial +17% to +5%. Memory delta is structural — one retained
PHP wrapper per visited node — and is the price of stable identity.

Native walk is still ~5× faster than the pure-PHP path while now
preserving `WP_Parser_Node` semantics.

- First measurement:
#391 (comment)
- After hot-path optimization:
#391 (comment)

## Test plan

- [x] PHPUnit suite passes on the pure-PHP path.
- [x] PHPUnit suite passes with the native extension loaded; the new
identity tests run instead of skipping.
- [x] `Native AST Walk Perf` workflow reports the cache cost vs. the PR
base on every push.
Stacked on #391.

#391 restored `WP_Parser_Node` identity semantics by interning native
child wrappers in PHP. That fixed correctness, but the PHP-side cache
formed a retention cycle and added measurable cost to hit-heavy
translator-style workloads.

This PR moves native wrapper identity out of PHP object properties
entirely:

- `WP_MySQL_Native_Parser_Node` no longer stores `$native_ast`,
`$native_node_index`, or a PHP-side identity-cache object.
- Native bridge calls now pass the wrapper itself, e.g.
`wp_sqlite_mysql_native_ast_get_children( $this )`.
- The Rust extension keeps a thread-local registry keyed by the PHP
wrapper object pointer.
- Registry entries map wrapper pointer -> `(NativeAstState, node_index,
is_materialized)`, and each AST keeps a node-index -> wrapper-pointer
cache.
- Cached wrapper hits return the existing PHP object by pointer with its
refcount bumped; the cache does not own a PHP reference.
- `__destruct()` releases a wrapper from the Rust registry.
Materialization marks the wrapper as detached from native reads while
leaving it discoverable from the parent cache as long as it is still
live.

That breaks the cycle that mattered here: PHP wrappers no longer
strongly reference a native AST object, and Rust no longer strongly
references PHP wrappers. PHP's cycle collector can collect wrapper
graphs normally; destructors then clean up the Rust registry entries.

Tokens remain un-interned. The public token API has no mutators, and no
caller in this repo relies on token object identity.

## Perf numbers

From the passing `Native AST Walk Perf` CI run on this head
(`2d93be25f599c3c4482480a6ab644d61b9337b12`), comparing this PR to the
native no-cache baseline (`codex/native-lazy-ast-facade`):

| Scenario | This PR | Baseline | Duration delta | Peak memory delta |
|---|---:|---:|---:|---:|
| parse only | 1.2859s, 54,098 qps, 30.0MB | 1.2715s, 54,711 qps, 30.0MB
| +1.1% | 0.0% |
| walk x1 | 3.3265s, 20,912 qps, 48.0MB | 3.4191s, 20,346 qps, 60.5MB |
-2.7% | -20.7% |
| rewalk x10 | 8.6352s, 8,056 qps, 52.0MB | 16.9658s, 4,100 qps, 90.5MB
| -49.1% | -42.5% |
| reread x20 | 1.8618s, 37,364 qps, 30.0MB | 2.4322s, 28,602 qps, 38.0MB
| -23.5% | -21.1% |
| subtree x5 | 9.8274s, 7,078 qps, 48.0MB | 13.8921s, 5,007 qps, 64.5MB
| -29.3% | -25.6% |

The parse-only path is effectively unchanged. The repeated-access
workloads this cache is meant to help are materially faster and use less
peak memory.

For context, the same CI run measured the pure-PHP path at:

| Scenario | Pure PHP |
|---|---:|
| parse only | 13.5292s, 5,142 qps, 68.0MB |
| walk x1 | 16.1332s, 4,312 qps, 70.0MB |

## Safety coverage

The PR now includes native-extension tests for:

- stable wrapper identity across repeated child/descendant reads;
- no reflected `$native_ast` / `$native_node_index` properties on
wrappers;
- child mutations surviving repeat reads and parent materialization;
- materialized children remaining discoverable from a still-native
parent;
- repeated parse/walk/drop loops staying memory-bounded;
- dropping root and descendant wrappers reclaiming registry entries;
- child wrappers outliving root variables without use-after-free;
- overlapping AST lifetimes not corrupting each other;
- mutation-before-drop and rewalk loops staying memory-bounded.

CI smoke checks also assert the SQLite-driver and WordPress
test-container paths select the native wrapper model and do not regress
back to `$native_ast` storage.

## Test plan

- [x] `cargo check` for `packages/php-ext-wp-mysql-parser`
- [x] `cargo fmt --check` for `packages/php-ext-wp-mysql-parser`
- [x] PHP lint on changed PHP files
- [x] Focused native identity/cycle PHPUnit tests with the Rust
extension loaded
- [x] Full `packages/mysql-on-sqlite` PHPUnit suite with the Rust
extension loaded
- [x] Full GitHub Actions suite on PR head
- [x] Native AST Walk Perf workflow on PR head
adamziel added 7 commits May 1, 2026 15:54
Addresses Jan's note on #381: in native mode, `new WP_MySQL_Parser(...)
instanceof WP_Parser` returns false because the native-mode class
extends the Rust-registered `WP_MySQL_Native_Parser`, which has no
`WP_Parser` in its chain. Existing downstream code doing `if ($parser
instanceof WP_Parser)` silently skipped the parser whenever the
extension was loaded.

This restores the contract by always extending the pure-PHP `WP_Parser`
and pulling the native-mode behaviour in via a trait:

```php
class WP_MySQL_Parser extends WP_Parser {
    use WP_MySQL_Native_Parser_Impl;
}
```

`WP_MySQL_Native_Parser_Impl` owns the composed `WP_MySQL_Native_Parser`
instance and the four-method delegation surface (`parse`, `next_query`,
`get_query_ast`, `reset_tokens`). `WP_Parser`'s protected state
(`$grammar`, `$tokens`, `$position`) is initialised by
`parent::__construct` and stays inert — the trait's overrides never read
it.

Adding a public method later means adding it to the trait — the class
file itself is two lines and doesn't need touching.

## Why a trait, not a private property?

A bare property would also work, but the trait keeps the class file
expressing only the routing decision (`extends WP_Parser` + `use
Rust_Implementation;`). The implementation lives in one place, symmetric
to where a future PHP-mode trait could live if we ever want to mirror
the structure. Behaviour-wise the two are equivalent.

## Performance

The trait adds **one extra method-call frame per public-API call**. The
public API is `parse()`, `next_query()`, `get_query_ast()`,
`reset_tokens()` — called once per query. The actual parsing work
happens inside the native call, so the delegation overhead is a small
constant per query, not a multiplier on the parsing work.

The `Parser Delegation Perf` workflow runs
`tests/tools/run-parser-benchmark.php` (parses the full MySQL
server-suite corpus, ~70k queries) three times on this PR and three
times on the PR base, on the same runner, with the extension loaded both
times. The comparison goes into the job summary on every push.

## Test plan

- [x] PHP-only PHPUnit suite passes.
- [ ] PHPUnit suite passes with the native extension loaded; the new
`WP_MySQL_Parser_Instanceof_Tests` confirm `instanceof WP_Parser` and
`instanceof WP_MySQL_Parser` both hold.
- [ ] `Parser Delegation Perf` workflow shows the delegation cost is
within noise.
@adamziel
Copy link
Copy Markdown
Collaborator Author

adamziel commented May 1, 2026

Perf comparison against trunk's PHP implementation:

Trunk PHP was measured from a detached origin/trunk worktree at fa5a7ba with the native-AST benchmark helper staged only in /tmp for measurement. Runtime: local PHP 8.2.29, same mysql-server-tests-queries.csv corpus, three sequential runs.

Scenario Trunk PHP avg This PR native Speedup
Parse only 14.3114s / 4,862 QPS / 68.0MB 1.1574s / 60,105 QPS / 30.0MB 12.37x by duration
Parse + walk 20.0804s / 3,465 QPS / 70.0MB 2.9751s / 23,383 QPS / 48.0MB 6.75x by duration

Raw trunk PHP runs:

parse-only: 13.9360s / 4,991 QPS / 68.0MB
parse-only: 14.6215s / 4,757 QPS / 68.0MB
parse-only: 14.3767s / 4,838 QPS / 68.0MB
walk:       19.5978s / 3,549 QPS / 70.0MB
walk:       20.6002s / 3,377 QPS / 70.0MB
walk:       20.0432s / 3,470 QPS / 70.0MB

The native numbers are from the latest release-extension perf run on this PR (13f131b): https://github.com/WordPress/sqlite-database-integration/actions/runs/25219828811.

@adamziel
Copy link
Copy Markdown
Collaborator Author

adamziel commented May 1, 2026

Let's land this! It shouldn't affect the PHP implementation on hosts without the native extension so it's safe to land. There is a few more rough edges, e.g. comments similar to:

this is exactly the kind of state the reviewer worried

that make no sense after the PR is merged. Also, overly verbose code style such as:

		if ( $lexer instanceof WP_MySQL_Native_Lexer ) {
			$tokens = $lexer->native_token_stream();
			return $this->reset_or_create_parser( $tokens );
		}

		$tokens = $lexer->remaining_tokens();
		return $this->reset_or_create_parser( $tokens );

that could be just

		$tokens = $lexer instanceof WP_MySQL_Native_Lexer
			? $lexer->native_token_stream()
			: $lexer->remaining_tokens();
		return $this->reset_or_create_parser( $tokens );

Or even just this, if we reused the remaining_tokens() method to return the token stream.

		return $this->reset_or_create_parser( $lexer->remaining_tokens() );

So let's address those in a follow up. Thank you @JanJakes!

@adamziel adamziel merged commit c43113d into trunk May 1, 2026
26 checks passed
@adamziel adamziel deleted the codex/native-lazy-ast-facade branch May 1, 2026 16:30
adamziel added a commit that referenced this pull request May 1, 2026
## What it does

Cleans up the native parser follow-up from #381 so the merged code reads
as permanent code, not review scaffolding.

It replaces duplicated inline verifier logic with one
`tests/tools/verify-native-parser-extension.php` entry point for
`mysql-on-sqlite`. The parser-extension workflow and PHPUnit bootstrap
both call the same verifier:

```bash
php -d extension=../php-ext-wp-mysql-parser/target/debug/libwp_mysql_parser.so \
  tests/tools/verify-native-parser-extension.php
```

It also collapses `WP_PDO_MySQL_On_SQLite::create_parser()` to one token
selection and one parser reset/create return, and rewrites native parser
test comments to describe behavior instead of PR review history.

## Rationale

#381 landed functional native parser support, but a few follow-up
surfaces still carried review-era wording and copied verifier blocks.
That makes future changes harder to read and easier to drift: native
parser routing, Rust AST handle storage, wrapper identity, and
materialized child behavior were being checked in multiple places.

The verifier now pins that runtime contract from one script: extension
loaded, `WP_MySQL_Lexer` resolves native, `WP_MySQL_Parser` delegates to
`WP_MySQL_Native_Parser`, the SQLite driver returns a native-backed AST,
native wrapper handle properties are absent, child identity is stable,
and materialized child mutations survive.

## Implementation

Added `wp_sqlite_verify_native_parser_extension()` with a shared
delegate check:

```php
function wp_sqlite_assert_native_parser_delegate( WP_MySQL_Parser $parser, string $context ): void {
    $reflection = new ReflectionObject( $parser );
    if ( ! $reflection->hasProperty( 'native' ) ) {
        wp_sqlite_native_parser_verification_fail( $context );
    }

    $native_property = $reflection->getProperty( 'native' );
    $native_property->setAccessible( true );
    if ( ! ( $native_property->getValue( $parser ) instanceof WP_MySQL_Native_Parser ) ) {
        wp_sqlite_native_parser_verification_fail( $context );
    }
}
```

`WP_SQLITE_REQUIRE_NATIVE_PARSER_EXTENSION=1` in the PHPUnit bootstrap
now loads that verifier instead of inlining the same checks.

`create_parser()` now selects tokens once:

```php
$tokens = $lexer instanceof WP_MySQL_Native_Lexer
    ? $lexer->native_token_stream()
    : $lexer->remaining_tokens();

return $this->reset_or_create_parser( $tokens );
```

The WordPress PHPUnit extension setup keeps its container-specific
verifier, but factors the repeated reflection checks into the same small
helper shape.

## Testing instructions

```bash
cargo fmt --check
bash -n .github/workflows/wp-tests-phpunit-native-extension-setup.sh
node --check .github/workflows/wp-tests-phpunit-run.js
php -l packages/mysql-on-sqlite/tests/tools/verify-native-parser-extension.php
php -l packages/mysql-on-sqlite/tests/bootstrap.php
php ./vendor/bin/phpcs .github/workflows/wp-tests-phpunit-native-extension-setup.sh packages/mysql-on-sqlite/tests/bootstrap.php packages/mysql-on-sqlite/tests/tools/verify-native-parser-extension.php packages/mysql-on-sqlite/src/sqlite/class-wp-pdo-mysql-on-sqlite.php packages/mysql-on-sqlite/tests/mysql/native/WP_MySQL_Native_Parser_Node_Identity_Tests.php packages/mysql-on-sqlite/tests/mysql/native/WP_MySQL_Parser_Instanceof_Tests.php

cd packages/mysql-on-sqlite
php -d extension=../php-ext-wp-mysql-parser/target/debug/libwp_mysql_parser.so tests/tools/verify-native-parser-extension.php
php ./vendor/bin/phpunit -c ./phpunit.xml.dist tests/mysql/native/WP_MySQL_Parser_Instanceof_Tests.php tests/mysql/native/WP_MySQL_Native_Parser_Node_Identity_Tests.php tests/mysql/native/WP_MySQL_Native_Parser_Node_Cycle_Tests.php
WP_SQLITE_REQUIRE_NATIVE_PARSER_EXTENSION=1 php -d extension=../php-ext-wp-mysql-parser/target/debug/libwp_mysql_parser.so ./vendor/bin/phpunit -c ./phpunit.xml.dist --filter 'WP_MySQL_(Native_Parser_Node_(Identity|Cycle)|Parser_Instanceof)_Tests'
php ./vendor/bin/phpunit -c ./phpunit.xml.dist tests/WP_SQLite_Driver_Query_Tests.php
WP_SQLITE_REQUIRE_NATIVE_PARSER_EXTENSION=1 php -d extension=../php-ext-wp-mysql-parser/target/debug/libwp_mysql_parser.so ./vendor/bin/phpunit -c ./phpunit.xml.dist tests/WP_SQLite_Driver_Query_Tests.php
```

CI is passing on `3f4153f`, including the PHP 8.0-8.5 Rust-extension
matrix and `WordPress PHPUnit Tests / Rust extension`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants