-
Notifications
You must be signed in to change notification settings - Fork 987
Record Readers
Record readers read data from files in DFS, converting the data into a series of value vectors. A reader is associated with a FormatPlugin and defined by a FormatPluginConfig. Each format plugin is associated with a StoragePlugin which provides access to the file system which stores the data read by the storage plugin. Each format plugin can also define a RecordWriter to support CTAS operations.
Each record reader instance is associated with a single file (or portion of a file) in the file system defined by the storage plugin.
The actual RecordReader API is quite simple:
void setup(OperatorContext context, OutputMutator output) throws ExecutionSetupException;
void allocate(Map<String, ValueVector> vectorMap) throws OutOfMemoryException;
int next();
The setup() method ...
The allocate() method ...
The next() method reads a fixed number of records into a previously-allocated record batch (set of value vectors.) Each call to next() returns a new schema, uses the existing schema, or signals EOF (by returning 0). Note that each schema change must occur at record batch boundaries.