Optimizer fixes cte multi index by jussisaurio · Pull Request #5512 · tursodatabase/turso

jussisaurio · 2026-02-21T20:43:15Z

enable multi-indexes in multiway (>=3) joins
fix shitty plans when CTEs are involved

yes TODO better description

…AND boundaries Single-element Parenthesized expressions like ((a = 1 OR b = 2)) are purely syntactic grouping, but break_predicate_at_and_boundaries didn't see through them. This prevented the optimizer from recognizing OR disjuncts inside parenthesized ON clauses, blocking multi-index OR optimization for queries like `JOIN t ON (a = x OR b = x)`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Plumbing for CTE/subquery cardinality estimation. The field will be populated by the optimizer after join ordering and used by outer queries to estimate how many rows a subquery/CTE produces. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Previously multi-index OR (and AND-intersection) was only considered for single-table access (lhs.is_none()). This meant queries like `JOIN t ON (a = x OR b = x)` couldn't use the OR-by-union optimization during joins, falling back to full scans instead. Remove the lhs.is_none() guard and instead validate that each disjunct's constraints only reference tables already available on the LHS, ensuring cross-table disjuncts are properly handled. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When MultiIndexScan was selected as the access method, the output cardinality computation fell through to the generic full-scan formula, producing wildly inflated row count estimates. This caused the DP optimizer to avoid plans that use multi-index OR during joins. Propagate estimated_rows from cost functions through the MultiIndexScan params and use them directly in output cardinality computation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The join optimizer now stores its output cardinality in SelectPlan's estimated_output_rows field (clamped to LIMIT when present). When an outer query references a CTE or subquery as a joined table, base_row_estimate reads this field instead of using the hardcoded fallback, enabling correct join ordering for queries like: WITH small AS (SELECT id FROM t LIMIT 3) SELECT ... FROM big_table JOIN small ON ... For CompoundSelect (UNION ALL, INTERSECT, etc.), branch estimates are combined according to set operation semantics. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Anonymized version of a real customer workload: a knowledge graph with TEXT UUID primary keys and CTE-driven neighbor lookups using OR-based multi-index scans. Tests verify both correct join ordering (CTE outer, multi-index OR on link and article tables) and result correctness with 2-seed and 5-seed variants. Verified against sqlite3 for result equivalence. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jussisaurio and others added 6 commits February 21, 2026 22:31

github-actions bot added core optimizer translation/planning labels Feb 21, 2026

fix: MVCC cursor ignored left join null flag

2227552

github-actions bot added the vdbe label Feb 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Optimizer fixes cte multi index#5512

Optimizer fixes cte multi index#5512
jussisaurio wants to merge 7 commits intomainfrom
optimizer-fixes-cte-multi-index

jussisaurio commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

jussisaurio commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant