-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Open
Labels
enhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changelog
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Some databases, one example being Grafana Tempo, utilize column dictionaries as makeshift column indexes, to improve filtering speed ad-hoc. Checking if low-cardinality value is present in dictionary allows to effectively pre-filter data by skipping whole row group
Describe the solution you'd like
Add API to ParquetRecordBatchStreamBuilder that allows to inspect contents of the dictionary
Describe alternatives you've considered
Column indexes have high expected size cost and are not always available (e.g. for legacy data)
Additional context
It is possible to access this information in SerializedFileReader already by using "peekable" page iterator
Metadata
Metadata
Assignees
Labels
enhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changelog