refactor(hogql): redesign events predicate pushdown on the lowering architecture#62239
Conversation
|
Hey @aspicer! 👋 It looks like your git author email on this PR isn't your
You can fix it for this repo with: git config user.email "you@posthog.com"Or set it globally with |
|
Query snapshots: Backend query snapshots updatedChanges: 30 snapshots (30 modified, 0 added, 0 deleted) What this means:
Next steps:
|
ac0b20c to
4b8073f
Compare
Introduce the dialect-neutral JSONFieldAccess AST node (the logical form of a property read after lowering) plus a clean logical-lowering pass that rewrites JSON-blob `properties.X` value reads to it, so the printer renders a mechanical leaf instead of resolving the property to a physical column. Wired into the postgres/duckdb pipeline (no materialized columns there, so logical lowering is the whole story). Gated behind HogQLContext.lower_property_access (default off) per the behavior-preservation policy (PRINTER_REARCHITECTURE.md §12.8): off => byte-identical to master. ClickHouse lowering is deferred until the physical mat/skip-index passes land, else materialized columns would regress. Validated: lowering is byte-identical off-vs-on across all four dialects; blob properties no longer reach the printer for pg/duckdb (reachability oracle); full printer+transforms suite green (flag off).
Adds the unqualified printer hint on FieldType and honors it in visit_field_type, so the lightweight-DELETE path can render bare materialized columns the mutation analyzer accepts. Dormant until the within_non_hogql path routes through lowering.
…gQL printer The HogQL printer fell back to the base ClickHouse JSONExtractRaw renderer for the new logical node, so dialect="hogql" output of a lowered property became CH internals with a parameter placeholder baked in — corrupting re-parseable HogQL (e.g. the stored batch-export hogql_query) and double-counting the property-key value. Render it back as the property chain.
Re-derives the ClickHouse physical property optimizations onto the JSONFieldAccess logical node: the materialized-column / dmat / property-group value substitution and the 8 skip-index comparison rewrites, reproducing master's rendering as RESULT-equivalent AST (execution + EXPLAIN gated, not byte-identical, §8.7). Dormant: NOT wired into the pipeline yet — it lands coupled with the gated ClickHouse lowering flip (next), so materialized columns don't regress. Preserves master behavior incl. the is-set-over-materialized over-match (decision A, §12.7) and the §8.1 to_printed_clickhouse table-name resolution for person-joined columns. 50 master-equivalence tests. The within_non_hogql_query unqualified-column path (§8.4) and inline sentinels are deferred until FieldType.unqualified / Constant.inline land with the wiring.
…ated) Wire lower_property_access + clickhouse_physical_passes into the ClickHouse pipeline (after both PropertySwapper passes), gated by HogQLContext.lower_property_access (default off). Flag-on ClickHouse is byte-identical to master for the unmaterialized corpus and result-equivalent for materialized/optimized cases; flag-off is untouched. Node-typing fix that makes the printer fully bypassable: JSONFieldAccess now carries its value type (nullable String), not the PropertyType, and the lowering pass repoints the hidden Alias wrapper's FieldAliasType too — otherwise the printer's comparison optimizers recover the PropertyType via resolve_field_type and route the property straight back into the printer's decision code (the reachability oracle caught this). The physical pass now resolves the materialized source from node.expr/node.keys. With this, the ClickHouse oracle reached-set for the corpus is EMPTY — the printer decides nothing. Removed an obsolete structural test whose premise (the physical pass resolves person-joined reads) is false — PoE-joined person reads are resolved by the lazy-table machinery; the §8.1 table-name behavior is covered end-to-end by TestPersonPropertyPhysicalPass.test_person_joined_equivalent. The within_non_hogql DELETE path is unaffected (its callers don't set the gate). Suite-wide oracle-clean and the printer-property-code deletion are the next steps.
…filter The prefilter runs on the printed AST, where lowering has turned an unmaterialized `properties.$x` read into a JSONFieldAccess rather than a PropertyType. Its column collector only knew PropertyType, so it dropped `properties` from the wrapped subquery and ClickHouse failed to resolve `events.properties`. Teach the collector to read the lowered node.
… assertions The person-predicate pushdown is correct, but after lowering the extracted clause is a JSONFieldAccess rather than a properties.x Field chain, so the structural assertions no longer matched the un-lowered _expr() expected. Normalize the lowered node back to the chain for these comparisons (lowering is covered by the printer tests).
…the lowered form The lowered property read is leaner (no orphaned values), so the pushed-down predicate's key/value placeholders shift from hogql_val_15/16 to 5/6. Identical SQL otherwise.
Compile each query both ways — lower_property_access off (old printer path) and on (logical lowering + ClickHouse physical passes) — and compare, per PRINTER_REARCHITECTURE.md §13. SQL-equal is the common case (byte-identical for unmaterialized reads); a SQL diff escalates to result-comparison by the caller that has data (execution net in test, cost-guarded Celery worker in prod). - differential.py: compile-both-ways core, shadow-context derivation, the env-gated compile-boundary hook, and a process registry for the suite sweep. The new path must fail loud — a recompile error is captured, never swallowed. - utils.py: wire the hook into prepare_and_print_ast (snapshot the pristine input before the old path mutates it; recompile and compare after; off by default). - test_property_differential.py: byte-identity across the logical corpus (all dialects) + boundary-hook behavior. - run_shadow_differential.py: suite-wide sweep runner, the equivalence-gate analogue of the reachability-oracle runner. Sweep over test_printer.py: 267 compiles, 256 byte-identical, 0 fail-loud errors, 10 result-equivalent divergences (materialized scrub-literal parameterization, ai-property mat-column fallback, property-group ternary) to fix toward master.
The differential shadow-compare (§13) is the deletion gate now — it measures equivalence (does new == old?) over real traffic, not reachability (is old code reached?), and the fail-loud new path means there is no silent fallback for an oracle to detect (§13.3). Remove the oracle scaffolding: - delete reachability_oracle.py, run_reachability_oracle.py, test_reachability_oracle.py, and the baseline report. - the lone consumer (test_logical_property_lowering's reachability check) now asserts deletion-readiness by direct AST inspection — no blob-backed PropertyType survives lowering — extended to cover ClickHouse too. - refresh stale oracle mentions in the corpus/harness docstrings. Dropping the PR0b oracle commit from the Graphite stack history is a separate rewrite, left as a TODO in §13.7.
compile_hogql_predicate (the lightweight-DELETE predicate path) and legacy filters compile with within_non_hogql_query=True, which the ClickHouse printer renders with unqualified column references — the lightweight-delete mutation analyzer rejects table-qualified columns like sharded_events.mat_$browser. The lowering pass builds table-qualified synthetic fields, so it must not touch them. Decline lowering when within_non_hogql_query is set (in lower_property_access, covering all dialects; the ClickHouse physical block is skipped explicitly). The query stays byte-identical on the printer path. Adapted from prior art (#61546). Adds a gate-on test proving the predicate stays unqualified + mutation-safe over a materialized column when the lowering gate is enabled.
The ClickHouse physical pass scrubs a materialized column with nullIf(nullIf(col, ''), 'null') and trims quotes with a fixed regex. The printer hand-builds those literals inline; the pass parameterized them — a result-equivalent text diff (§13.8 bucket 1). Add an opt-in Constant.inline flag the ClickHouse printer honors for known-safe literal strings, set on the physical pass's fixed sentinels, so it emits the printer's inline form. This removes the scrub-literal difference where the physical pass produces the read; some materialized projections keep a separate result-equivalent divergence (a lowered projection's auto-alias) for later. Safe: 650 printer tests pass, 0 fail-loud errors. Adapted from prior art (#61546).
…nline Physical pass now reproduces the printer's property-group/materialized skip-index optimizations it was missing: detects a property through a toBool(transform(...)) cast (resolved-PropertyType fallback), resolves a select-alias back to its lowered read, routes JSONHas to the property group's keys index, and marks within_non_hogql synthetic columns unqualified. CloningVisitor now carries Constant.inline so the inline mat-scrub sentinels survive post-pass cloning.
The hook lived only in prepare_and_print_ast, but the main execution path bypasses it: execute_hogql_query -> HogQLQueryExecutor._generate_clickhouse_sql calls prepare_ast_for_printing + print_prepared_ast directly. A sweep over test_query.py showed 0 compiles — the execution path was verifying nothing. Add the snapshot -> recompile -> compare to the executor (gated to a no-op unless a sweep is active; runs after the served SQL is produced, so a shadow failure can never affect the query). A test_query.py sweep now compiles 406 real end-to-end queries, all byte-identical, 0 errors. Documents the real coverage in §13.9.
In test mode (settings.TEST) — not an opt-in env var, not a separate sweep — every executed HogQL query has the new lowering path rendered and compared, so CI tests both paths on every run. The two paths share resolution + the swappers and differ only in the lowering + physical passes, so check_query renders the new path FROM the resolved+swapped tree the old path already produced (prepared_ast), applying only those steps to a clone — it never re-resolves (resolution is path-independent, and a second pass would fight one-shot test fixtures). - check_query (printer/differential.py): SQL-equal => same results (free); SQL-differs => execute both and compare result ROWS, failing only on a real divergence. Result-equivalent churn passes; unexecutable cases skip. - Hooked at HogQLQueryExecutor.execute and prepare_and_print_ast. - Test query-capture excludes the differential's verification queries (is_in_differential) so snapshots reflect only the code under test. - Removes the opt-in HOGQL_SHADOW_DIFFERENTIAL env hook + the standalone sweep runner (the suite itself is the sweep).
A per-organization feature flag (hogql-property-lowering) sets a propertyLowering query modifier; prepare_ast_for_printing maps it onto HogQLContext. lower_property_access, so an enabled org serves the new lowering path and everyone else stays on the legacy printer path. Off leaves the modifier unset (not False) so it never overrides an explicitly-set gate. This is the production rollout switch: one flag flip per org switches it over, after the differential (in test) and the Celery shadow (in prod) confirm parity.
…caffolding Make the logical-lowering + ClickHouse physical passes the only path (remove the gate), and delete the temporary verification harness it was proven against: the differential, the golden corpus, the characterization net, and the old-vs-new comparison tests. The new path now serves every query; query snapshots will regenerate to the lowered SQL. (Old printer property code + the propertyLowering flag are removed in the following commits.)
…_access gate The new lowering path is now unconditional, so the per-org rollout flag and the HogQLContext gate field that selected between old/new are dead. Removes them and the schema modifier; behavior is unchanged (the new path already served).
The within_non_hogql lowering exclusion is now unconditional (no gate), so the test that pinned 'gate on stays unqualified' just verifies within_non_hogql + a materialized property still compiles to the unqualified printer form.
…h-15-clean-cutover # Conflicts: # posthog/hogql/printer/base.py # posthog/hogql/printer/clickhouse.py # posthog/hogql/printer/types.py
… resolution PR 61108 added typed (numeric/datetime) materialized columns with a printer-side bare-comparison rewrite and type observability counters. On the lowering architecture the printer no longer decides property access, so the rewrite and the usage/range counters move into the ClickHouse property resolution pass: typed nullable columns compare bare (with an isNotNull guard and toDateTime64 literal conversion), typed columns skip the string sentinel scrub on value reads, and string-only optimizers decline non-string columns.
The planner analyzes the resolved AST (it runs at swap time); the prepared AST is now lowered past the point where property reads are recognizable, so preparing fully made plan_property_comparison return None.
…ql-rearch-16-pushdown-redesign # Conflicts: # posthog/hogql/transforms/test/__snapshots__/test_events_predicate_pushdown.ambr
…ql-rearch-16-pushdown-redesign
|
⏭️ Skipped snapshot commit because branch advanced to The new commit will trigger its own snapshot update workflow. If you expected this workflow to succeed: This can happen due to concurrent commits. To get a fresh workflow run, either:
|
…h-15-clean-cutover
…ql-rearch-16-pushdown-redesign
The lowered property-read node is an abstract key-value access; whether the value is served from a JSON extract, a materialized column, or a property-group map is a later pass's decision, so the node name should not encode the JSON fallback.
…ql-rearch-16-pushdown-redesign
|
⏭️ Skipped snapshot commit because branch advanced to The new commit will trigger its own snapshot update workflow. If you expected this workflow to succeed: This can happen due to concurrent commits. To get a fresh workflow run, either:
|
Query snapshots: Backend query snapshots updatedChanges: 1 snapshots (1 modified, 0 added, 0 deleted) What this means:
Next steps:
|
…sign # Conflicts: # posthog/hogql/transforms/events_predicate_pushdown.py # posthog/hogql/transforms/test/test_events_predicate_pushdown.py
Folded from the stacked nullability branch: with maximal subexpression hoisting restored, a hoisted non-nullable join key types as a blanket-nullable subquery column without this, and the printer's ifNull() wrapping produces a JOIN ON expression ClickHouse rejects (cannot determine join keys). Reading the projected column's resolved constant type keeps hoisted join keys bare, matching real-table behavior.
Query snapshots: Backend query snapshots updatedChanges: 39 snapshots (39 modified, 0 added, 0 deleted) What this means:
Next steps:
|
The printer now reads a subquery column's nullability from its resolved type, so comparisons over session_replay_events' min()/sum() projections drop their redundant ifNull wrappers.
Query snapshots: Backend query snapshots updatedChanges: 6 snapshots (6 modified, 0 added, 0 deleted) What this means:
Next steps:
|
Guards both directions of the printer's subquery-nullability rule: a genuinely nullable projection keeps its ifNull comparison guard, a non-nullable one prints bare.
Problem
Events predicate pushdown is a query optimization. When a query reads
eventswith aJOINand aLIMIT, ClickHouse otherwise scans the whole events table, fans it out across the join, and applies theLIMITlast. Pushdown wraps the events scan in a subquery that filters andLIMITs first, so the join only ever sees the small pre-filtered slice:The original implementation predated the lowering rearchitecture: it decided which physical column the subquery had to expose by predicting the printer's materialized-column resolution — duplicating logic that now lives in the ClickHouse physical pass. This rewrites it on the new architecture so it never predicts physical columns. Stacked on the clean cutover (#61943).
Changes
Pushdown now runs between logical lowering and the ClickHouse physical pass, so it works on the dialect-neutral form — a
properties.xread is aJSONFieldAccess, not yet a materialized column. It does two things:WHERE, verbatim. They already reference the real events table, so the physical pass optimizes them (materialized columns, skip indexes) in place.SELECT. Each maximal subexpression that depends only on the events table — a column, a property read,JSONHas(properties, k),match(uuid, …),upper(properties.x)— is projected under a name and the outer reference rewritten to read that one column. One uniform rule: touches only events → project it; touches a joined table → leave it outside. No physical-column inspection, no per-function special-casing — the physical pass resolves eachJSONFieldAccessinside the subquery afterward.It is a pure optimization: every rewrite is result-equivalent, and any unexpected error leaves the query untouched (it just runs flat).
No carve-outs; one printer fix instead. The old version had bespoke join-key/value modes and a property-key resolver, all to work around a subquery column being typed blanket-nullable — which made the printer
ifNull-wrap a hoisted join key, so ClickHouse rejected the join ("Cannot determine join keys"). The root cause is fixed in the printer:BasePrinter._is_type_nullablenow reads a subquery column's real nullability through the subquery's type instead of defaulting to nullable. So a non-nullable value isn't wrapped, a join key hoists like any other column, and a property key wraps only when genuinely nullable. Every carve-out is deleted; everything hoists through one path.Removed
LazyTypeDetector. This guard bailed pushdown if it found an unresolved lazy join (person.*,session.*). It is dead: pushdown runs after lazy joins resolve; a query whose only "join" is an implicit lazy one isn't eligible (pushdown requires a realJOIN); and the uniform rule already leaves any lazy reference in the outer query (it isn't the events table, so it counts as foreign). Confirmed it never fires across the whole suite before removing it.How did you test this code?
I'm an agent (Claude Code). Automated tests I ran:
test_events_predicate_pushdown.py) — 135 pass, including the ClickHouse execution-equivalence tests that create events, run the query with pushdown on and off, and assert identical results (plain / materialized / dmat / property-group columns, plus self-join, fan-out, outer-JSONHas, and a materialized-property join key).LazyTypeDetectornever fires in the full pipeline before removing it.Automatic notifications
Docs update
No user-facing change — this is an internal query-compilation optimization, so no docs update is needed.
🤖 Agent context
Authored by Claude Code (Opus), driven by Sandy. Branch 16 of the HogQL lowering rearchitecture stack, stacked on the clean cutover (#61943).
Key decisions: converged on the single maximal events-only subexpression rule after noticing that
JSONHasandmatch()are no different from any pure function over events columns — they should hoist by one rule, not per-function cases. The nullable-join-key fix went through a couple of iterations (explicit join-key modes, then a property-key resolver) before landing on the printer's nullability check, which let every carve-out be deleted. TheLazyTypeDetectorremoval came from checking the guard against the pass's actual pipeline position: it protects against a precondition that can no longer occur once pushdown runs after lazy resolution._build_subquerystill hand-builds the subquery's container type rather than callingresolve_types, because the resolver runs before lowering and would null out the loweredJSONFieldAccesstypes. This is documented in the module docstring.