Skip to content

Comments

Optimizer fixes cte multi index#5512

Draft
jussisaurio wants to merge 7 commits intomainfrom
optimizer-fixes-cte-multi-index
Draft

Optimizer fixes cte multi index#5512
jussisaurio wants to merge 7 commits intomainfrom
optimizer-fixes-cte-multi-index

Conversation

@jussisaurio
Copy link
Collaborator

  • enable multi-indexes in multiway (>=3) joins
  • fix shitty plans when CTEs are involved

yes TODO better description

jussisaurio and others added 6 commits February 21, 2026 22:31
…AND boundaries

Single-element Parenthesized expressions like ((a = 1 OR b = 2)) are
purely syntactic grouping, but break_predicate_at_and_boundaries didn't
see through them. This prevented the optimizer from recognizing OR
disjuncts inside parenthesized ON clauses, blocking multi-index OR
optimization for queries like `JOIN t ON (a = x OR b = x)`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Plumbing for CTE/subquery cardinality estimation. The field will be
populated by the optimizer after join ordering and used by outer queries
to estimate how many rows a subquery/CTE produces.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previously multi-index OR (and AND-intersection) was only considered for
single-table access (lhs.is_none()). This meant queries like
`JOIN t ON (a = x OR b = x)` couldn't use the OR-by-union optimization
during joins, falling back to full scans instead.

Remove the lhs.is_none() guard and instead validate that each disjunct's
constraints only reference tables already available on the LHS, ensuring
cross-table disjuncts are properly handled.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When MultiIndexScan was selected as the access method, the output
cardinality computation fell through to the generic full-scan formula,
producing wildly inflated row count estimates. This caused the DP
optimizer to avoid plans that use multi-index OR during joins.

Propagate estimated_rows from cost functions through the MultiIndexScan
params and use them directly in output cardinality computation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The join optimizer now stores its output cardinality in SelectPlan's
estimated_output_rows field (clamped to LIMIT when present). When an
outer query references a CTE or subquery as a joined table,
base_row_estimate reads this field instead of using the hardcoded
fallback, enabling correct join ordering for queries like:

  WITH small AS (SELECT id FROM t LIMIT 3)
  SELECT ... FROM big_table JOIN small ON ...

For CompoundSelect (UNION ALL, INTERSECT, etc.), branch estimates are
combined according to set operation semantics.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Anonymized version of a real customer workload: a knowledge graph with
TEXT UUID primary keys and CTE-driven neighbor lookups using OR-based
multi-index scans. Tests verify both correct join ordering (CTE outer,
multi-index OR on link and article tables) and result correctness with
2-seed and 5-seed variants.

Verified against sqlite3 for result equivalence.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added the vdbe label Feb 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant