Skip to content

tracking: aggregate-stats scan optimization #8126

@discord9

Description

@discord9

Summary

Support aggregate-stats based scan optimization — when COUNT/MIN/MAX aggregates can be answered from parquet row-group/file statistics, skip normal row scans for eligible files and fall back safely when stats are missing or unsupported. This reduces scan cost for simple aggregate queries while preserving correctness through schema/type checks and row-scan fallback.

Related PRs

Component Breakdown

Component Description Status
Foundation types / helpers Adds store-api stats DTOs, common-query stats candidate evaluation, and table row-group pruning stats helpers 🔄
Scanner runtime integration Adds RegionScanner::scan_stats, stats-aware scanner properties/request wiring, runtime decision state, and mito2 stats stream production 🔜
Optimizer rewrite Adds AggrStats optimizer rule and StatsScanExec to rewrite eligible aggregate scans to stats-backed execution 🔜
Validation coverage Adds sqlness, integration tests, and partition/filter test adaptations for end-to-end behavior 🔜

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions