Exposing `ColumnChunkMetadata.statistics` and (opt-in) `ColumnIndexMetaData` to JavaScript

Currently, there is no way to determine if a row group should be pruned when trying to limit the work done to find a particular (set of) records. Coarse-grain column chunk statistics can say whether a row group contains a particular value on a sorted column. This is useful for avoiding a full scan of the whole file, and with small enough row groups, it works well on its own. They just aren't visible on the JavaScript API of [`ColumnChunkMetaData`](https://kylebarron.dev/parquet-wasm/classes/bundler_parquet_wasm.ColumnChunkMetaData.html)

The page index is more granular, and lets you use the `limit` and `offset` options on [`ReaderOptions`](https://kylebarron.dev/parquet-wasm/types/esm_parquet_wasm.ReaderOptions.html) without doing more explicit scanning of a file from JavaScript. It does make initially opening the file more expensive so they aren't automatically read as implemented right now, but they are also not visible in the JavaScript API either.

Supporting the `ColumnChunkMetaData.statistics` is pretty simple, adding a new method to the existing `ColumnChunkMetaData` WASM type mirroring the native one. Supporting the page index involves adding a new method for loading one or more columns' indices and packing them up so they can be used from JavaScript, which could be much more work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exposing `ColumnChunkMetadata.statistics` and (opt-in) `ColumnIndexMetaData` to JavaScript #863

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Exposing ColumnChunkMetadata.statistics and (opt-in) ColumnIndexMetaData to JavaScript #863

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Exposing `ColumnChunkMetadata.statistics` and (opt-in) `ColumnIndexMetaData` to JavaScript #863