Skip to content

Add Arrow Native (ADBC) Server Protocol for High-Performance Data Access #10296

@borodark

Description

@borodark

Is your feature request related to a problem? Please describe.

Analytics backends and data science tools increasingly demand high-performance, binary data transfer protocols. The current REST HTTP API, while flexible and widely compatible, introduces significant overhead for data-intensive workloads:

  • JSON serialization/deserialization adds latency
  • Text-based protocols are inefficient for large result sets
  • No standard binary protocol means each client must implement custom optimizations

Modern analytics ecosystems (Python/pandas, R, Julia, Elixir/Livebook) are converging on Arrow as the standard in-memory columnar format, and ADBC (Arrow Database Connectivity) as the standard database access API. Users expect databases and semantic layers to support these standards natively.

Describe the solution you'd like

Add an Arrow Native server to CubeSQL that:

  1. Speaks Arrow IPC protocol on a dedicated port (default: 8120)
  2. Returns Arrow RecordBatches directly - no JSON serialization overhead
  3. Works with this ADBC client, Python.
  4. Optional query result caching for repeated queries
    This enables 8-15x faster data transfer compared to the REST API for typical analytics workloads.

Describe alternatives you've considered

  1. Arrow Flight SQL - More complex protocol, requires gRPC. ADBC is simpler and sufficient for CubeSQL's use case.
  2. Optimizing REST API - JSON will always have serialization overhead. Binary protocols are fundamentally faster for columnar data.
  3. Custom binary protocol - Would require custom clients. ADBC is an emerging standard with growing ecosystem support.

Additional context

The ADBC ecosystem is maturing rapidly:

Having options is good - especially when one option is significantly faster. Users connecting BI tools via PostgreSQL protocol still work. Users calling the REST API still work. But users who need maximum performance now have a path: ADBC on port 8120.

Performance comparison (cached, 20K rows):

  • REST HTTP API: 2133ms
  • Arrow Native: 8ms (266x faster)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions