Fast factorization of character columns

Processing character columns is by far the slowest of all data types. For character columns (that are not completely random) we can solve this problem by first converting the vector into a factor. Factors can be efficiently serialized, provided the number of levels is significantly smaller than the number of rows. Random access will suffer because we have to load all levels even for a small subset of data. This can be partly solved by reading with a streaming object that caches the levels after a first read. Subsequent reads will then be faster.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fast factorization of character columns #38

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Fast factorization of character columns #38

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions