Skip to content

Exposing ColumnChunkMetadata.statistics and (opt-in) ColumnIndexMetaData to JavaScript #863

Description

@mobiusklein

Currently, there is no way to determine if a row group should be pruned when trying to limit the work done to find a particular (set of) records. Coarse-grain column chunk statistics can say whether a row group contains a particular value on a sorted column. This is useful for avoiding a full scan of the whole file, and with small enough row groups, it works well on its own. They just aren't visible on the JavaScript API of ColumnChunkMetaData

The page index is more granular, and lets you use the limit and offset options on ReaderOptions without doing more explicit scanning of a file from JavaScript. It does make initially opening the file more expensive so they aren't automatically read as implemented right now, but they are also not visible in the JavaScript API either.

Supporting the ColumnChunkMetaData.statistics is pretty simple, adding a new method to the existing ColumnChunkMetaData WASM type mirroring the native one. Supporting the page index involves adding a new method for loading one or more columns' indices and packing them up so they can be used from JavaScript, which could be much more work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions