Skip to content

Zone Map Pruning for Metrics#6363

Open
alexanderbianchi wants to merge 1 commit intomainfrom
bianchi/zonemap
Open

Zone Map Pruning for Metrics#6363
alexanderbianchi wants to merge 1 commit intomainfrom
bianchi/zonemap

Conversation

@alexanderbianchi
Copy link
Copy Markdown
Collaborator

@alexanderbianchi alexanderbianchi commented Apr 29, 2026

Summary

  • Adds conservative scan-time metadata pruning for metrics splits after the metastore split list is fetched.
  • Extracts string equality and IN predicates from DataFusion filters and evaluates them against split metadata before building Parquet file groups.
  • Extracts simple prefix LIKE predicates such as host LIKE 'ID-07%' for metrics tag metadata pruning.
  • Prunes with exact metric_name metadata, exact low-cardinality tag metadata, and per-column zonemap_regexes when present.
  • Evaluates prefix LIKE against zonemap regexes as a regex-language prefix intersection, not by matching the prefix string itself, so values like ID-0701 satisfy ID-07%.
  • Keeps splits when relevant metadata is missing, zonemap regexes are invalid, or zonemap prefix evaluation cannot be completed, so older or partially populated metadata cannot produce false negatives.
  • Marks metric and tag predicates, including non-negated LIKE, as inexact pushdown so DataFusion passes them into the TableProvider scan while still applying the row-level filter later.
  • Updates the metrics integration test helper to preserve writer-produced row-key and zonemap metadata, and adds end-to-end pruning regression tests for equality and prefix LIKE zonemap pruning.

Notes

This intentionally does not push tag predicates into the metastore query yet. Metric name and time range still drive metastore-side pruning; tag and zonemap pruning runs locally on returned split metadata before Parquet files are read.

Prefix LIKE support is deliberately narrow: only literal trailing-percent patterns are extracted, for example host LIKE 'ID-07%'. More general LIKE patterns, escaped wildcards, negated LIKE, and ILIKE remain row-level filters.

Testing

  • cargo fmt --package quickwit-datafusion
  • cargo test -p quickwit-datafusion

@alexanderbianchi
Copy link
Copy Markdown
Collaborator Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🎉

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant