Open
Description
Processing character columns is by far the slowest of all data types. For character columns (that are not completely random) we can solve this problem by first converting the vector into a factor. Factors can be efficiently serialized, provided the number of levels is significantly smaller than the number of rows. Random access will suffer because we have to load all levels even for a small subset of data. This can be partly solved by reading with a streaming object that caches the levels after a first read. Subsequent reads will then be faster.