Skip to content

Epic: Two-Phase Read Pushdown (predicate/projection/limit/order-by) #551

@ethe

Description

@ethe

Goal

Implement RFC 0010’s two-phase read path so scans push predicates, projection, limit, and PK ordering down to memtables and Parquet SSTs. Deliver a reusable ScanPlan with pruning/RowSets and an execution stream that filters before projection, prunes row-groups/pages, enforces limits early, and preserves PK order.

Current state to build on:

  • ScanPlan/projection_with_predicate exist; MergeStream/PackageStream already do PK-ordered MVCC merge + limit; SstableScan handles MVCC/delete sidecar.
  • Missing: RowSet abstraction, SST-level pruning (sst_entries includes all), row-group/page pruning, Parquet RowFilter pushdown, bloom filters.

Acceptance criteria:

  • Planner builds a RowSet per source, extends scan schema with predicate columns, and filters SST/memtable inputs by key bounds and commit_ts stats.
  • Parquet scans fetch metadata async, prune row-groups/pages by min/max (and read_ts), and push a predicate-derived RowFilter; only projected columns are read.
  • Execution preserves PK-ascending order through pruning, applies residuals (if any), and enforces limit early (stop once satisfied).
  • Missing columns in predicates error cleanly; NULL semantics match RFC.
  • Tests cover SST/memtable pruning, row-group/page skip, projection+predicate, MVCC correctness, early limit, PK order contract, and missing-column errors.
  • Bloom filters are either implemented as a stretch (write+read) or explicitly out of scope.

Stories:

  1. Planner & pruning foundation
  2. Parquet pushdown execution
  3. Validation & tests
  4. Stretch: Bloom filters

Metadata

Metadata

Assignees

Labels

XL - Extra LargeSystem architecture overhaul, adding support for new platforms, large-scale dependency updates.

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions