This is a pure Julia implementation of the Apache Arrow data standard. This package provides Julia AbstractVector objects for
referencing data that conforms to the Arrow standard. This allows users to seamlessly interface Arrow formatted data with a great deal of existing Julia code.
Please see this document for a description of the Arrow memory layout.
The package can be installed by typing in the following in a Julia REPL:
julia> using Pkg; Pkg.add("Arrow")Arrow.jl currently requires Julia 1.12+.
When developing on Arrow.jl it is recommended that you run the following to ensure that any changes to ArrowTypes.jl are immediately available to Arrow.jl without requiring a release:
julia --project -e 'using Pkg; Pkg.develop(path="src/ArrowTypes")'Current write-path notes:
Arrow.tobufferincludes a direct single-partition fast path for eligible inputsArrow.tobuffer(Tables.partitioner(...))also includes a targeted direct multi-record-batch path for single-column top-level strings and single-column non-missing binary/code-units columnsArrow.write(io, Tables.partitioner(...))now reuses that same targeted direct multi-record-batch path instead of always going through the legacyWriterorchestration- multi-column partitions, dictionary-encoded top-level columns, map-heavy inputs, and missing-binary partitions retain the existing writer path
This implementation supports the 1.0 version of the specification, including support for:
- All primitive data types
- All nested data types
- Dictionary encodings and messages
- Dictionary-encoded
CategoricalArrayinterop, including missing-value roundtrips throughArrow.Table,copy, andDataFrame(...; copycols=true) - Extension types
- Lightweight schema/field metadata overlays via
Arrow.withmetadata(...)for Tables.jl-compatible sources before serialization - Base Julia
Enumlogical types via theJuliaLang.Enumextension label, with native Julia roundtrips back to the original enum type whileconvert=falseand non-Julia consumers still see the primitive storage type - View-backed Utf8/Binary columns, including recovery from under-reported variadic buffer counts by inferring the required external buffers from valid view elements
- Streaming, file, record batch, and replacement and isdelta dictionary messages
It currently doesn't include support for:
- Tensor or sparse tensor IPC payload semantics; Arrow.jl now recognizes those message headers explicitly and rejects them with precise errors instead of falling through to a generic unsupported-message path
- C data interface
- Writing Run-End Encoded arrays; Arrow.jl now reads REE arrays and exposes them as read-only vectors, but still rejects REE on write paths
Flight RPC status:
- Experimental
Arrow.Flightsupport is available in-tree - Requires Julia
1.12+ - Includes generated protocol bindings and complete client constructors for the
FlightServiceRPC surface - Keeps the top-level Flight module shell thin, with exports and generated-protocol setup split out of
src/flight/Flight.jl - Includes high-level
FlightData <-> Arrow IPChelpers forArrow.Table,Arrow.Stream, and DoPut/DoExchange payload generation,Arrow.Flight.pathdescriptor(...)for PATH descriptors without manual proto assembly, opt-inapp_metadatasurfacing throughinclude_app_metadata=trueonArrow.Flight.stream(...)/Arrow.Flight.table(...), explicit batch-wiseapp_metadata=...emission onArrow.Flight.flightdata(...),Arrow.Flight.putflightdata!(...), and source-basedArrow.Flight.doexchange(...), and a reusableArrow.Flight.withappmetadata(...)wrapper so source-level batch metadata can stay attached without manual keyword threading - Keeps the Flight IPC conversion layer modular under
src/flight/convert/, withsrc/flight/convert.jlretained as a thin entrypoint - Includes client helpers for request headers, binary metadata, handshake token reuse, and TLS configuration via
withheaders,withtoken, andauthenticate - Keeps the Flight client implementation modular under
src/flight/client/, with thin entrypoints atsrc/flight/client.jlandsrc/flight/client/rpc_methods.jl - Includes a transport-agnostic server core (
Service,ServerCallContext,ServiceDescriptor,MethodDescriptor) for local Flight method dispatch, path lookup, handler testing, high-levelDoExchangeassembly throughArrow.Flight.exchangeservice(...),Arrow.Flight.tableservice(...), andArrow.Flight.streamservice(...), and source-based local invocation throughArrow.Flight.doexchange(service, context, source; ...),Arrow.Flight.table(service, context, source; ...), andArrow.Flight.stream(service, context, source; ...) - Keeps the transport-agnostic server core modular under
src/flight/server/, withsrc/flight/server.jlretained as a thin entrypoint - Includes an optional
gRPCServer.jlpackage extension that mapsArrow.Flight.ServiceintogRPCServer.ServiceDescriptorand registers Flight proto types with the external server package when it is present - Keeps the optional
gRPCServer.jlbridge modular underext/arrowgrpcserverext/, withext/ArrowgRPCServerExt.jlretained as a thin entrypoint - Includes optional live interoperability coverage for
Handshake, authenticated token propagation,PollFlightInfo, and TLS via dedicated Python reference servers - Includes optional live
pyarrow.flightinteroperability coverage forListFlights,GetFlightInfo,GetSchema,DoGet,DoPut,DoExchange,ListActions, andDoAction - Keeps targeted Flight verification modular under
test/flight/, withtest/flight.jlretained as a thin entrypoint for local and CI invocation stability, the client-constructor/protocol-wrapper checks decomposed undertest/flight/client_surface/, the optionalgRPCServerextension scenarios decomposed undertest/flight/grpcserver_extension/, thepyarrow.flightinterop scenarios decomposed undertest/flight/pyarrow_interop/, and the transport-agnostic server-core checks decomposed undertest/flight/server_core/ - Includes
test/flight_grpcserver.jlas a temporary-environment runner for optional nativegRPCServercoverage without mutatingtest/Project.toml - Dedicated CI jobs now exercise the Flight interop suite on stable and nightly Linux; native Julia server transport remains optional/experimental and is not part of the default Flight suite
Third-party data formats:
- CSV, parquet and avro support via the existing CSV.jl, Parquet.jl and Avro.jl packages
- Other Tables.jl-compatible packages automatically supported (DataFrames.jl, JSONTables.jl, JuliaDB.jl, SQLite.jl, MySQL.jl, JDBC.jl, ODBC.jl, XLSX.jl, etc.)
- No current Julia packages support ORC
Canonical extension highlights:
UUIDnow writes the canonicalarrow.uuidextension name by default while retaining reader compatibility with legacyJuliaLang.UUIDmetadataArrow.TimestampWithOffset{U}provides a canonicalarrow.timestamp_with_offsetlogical type without conflating offset-only semantics withZonedDateTimeArrow.Bool8provides an explicit opt-in writer/reader surface for the canonicalarrow.bool8extension without changing the default packed-bitBoolpathArrow.JSONText{String}provides a text-backed logical type for the canonicalarrow.jsonextension without parsing payloads during read or writearrow.opaquenow reads as the underlying storage type without warning, and explicit writer metadata can be generated withArrow.opaquemetadata(type_name, vendor_name)Arrow.variantmetadata(),Arrow.fixedshapetensormetadata(...), andArrow.variableshapetensormetadata(...)generate canonical metadata strings for advanced canonical extensionsarrow.fixed_shape_tensorandarrow.variable_shape_tensorare recognized on read as canonical passthrough extensions over their storage types, and Arrow.jl now validates their canonical metadata plus top-level storage shape before accepting themarrow.parquet.variantis recognized on read as a canonical passthrough extension over its storage type; Arrow.jl currently validates that its canonical metadata is the required empty string, but does not yet implement deeper variant semantics or an automatic writer surface- Legacy
JuliaLang.ZonedDateTime-UTCandJuliaLang.ZonedDateTimefiles remain readable for backward compatibility
See the full documentation for details on reading and writing arrow data.