Open
Description
Not a focus now, just raising issue here for tracking
Currently in progress.
Initial support
Tracked by initial-write-support
branch
- Merged to main, further development done directly to main
Checklist:
- High level ArrowWriter synchronous interface (accepts RecordBatches to write)
- Basic configuration via builder
- Stripe writer
- Metadata writer
- Value encoding
- Integer RLEv2
- Short repeat
- Direct
- Delta
- Patched base
- Base 128 varint
- Byte RLE
- Integer RLEv2
- Encode nullability
- Float/Double array
- Short/Int/Long array
- String/Binary array
- Boolean array
- Byte array
- Basic struct array support (for root)
Once complete will raise PR for all the above, to provide a complete and usable writer (though lacking in features see below).
Subsequent features
Following items will be added in smaller PRs once base code of writer is merged to main.
- Asynchronous interface
- Compression
- Zlib
- Snappy
- Lzo
- Lz4
- Zstd
- Statistics
- Int
- Double
- String
- Bucket
- Decimal
- Date
- Binary
- Timestamp
- Dictionary array
- Run length array
- Decimal array
- Date array
- Timestamp array
- Compound array
- Union array
- Map array
- List array
- Struct array
- Index streams
- Row group index
- Bloom filters
- Extension configuration (see Java config for examples)
- User metadata
- Arrow type hint (when writing with this Arrow -> ORC writer, encode the original Arrow type in metadata so when reading, we can recreate original Arrow array)
- TODO: other Arrow types