Skip to content

Expose function to allow reading individual columns #758

Description

@keller-mark

Thank you very much for this library, it has already been a huge help.

I am wondering if it would be possible to expose a function to load one (or potentially a subset of) columns.

I have successfully been able to use Range requests to load only the footer bytes, and pass this to readSchema (then tableFromIPC with the .intoIPCStream() result) to get the schema without loading the full table bytes.

I can then load the full table bytes and using the columns option of readParquet, I can read a table with a subset of columns.

However, this requires me to first load all table bytes, including for columns which I may never use.

As I understand, the Parquet footer contains enough info to load per-column bytes from each row group.

If I was to somehow concatenate the resulting bytes together, then I could pass them to parquet-wasm to transform into an IPC stream, for example resulting in a one-column arrow table. I am not sure how the concatenation of the Range request results would need to work, or how much complexity this would involve on the rust side.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions