Skip to content

Conditional read on a fst file #30

Open
@MarcusKlik

Description

@MarcusKlik

By specifying a condition on one or more columns of the stored table, data can be read using far less memory than a full read combined with a selection of rows. Related to issue #15 and issue #16: data can be read using a stream object and selection can be done on chunks of data, rather than the complete data set. Restrictions:

  • Condition cannot contain aggregate statements that depend on the whole set, e.g. median(ColA) / sum(ColA).
  • Size of result is not known in advance, so a binding of smaller result sets is required (like data.table's rbindlist). This will have an effect on performance.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions