Skip to content

Latest commit

 

History

History
69 lines (51 loc) · 5.5 KB

File metadata and controls

69 lines (51 loc) · 5.5 KB
title Arrow Data Accelerator Deployment Guide
sidebar_label Deployment Guide
description Operating guide for the Arrow (in-memory) data accelerator in production: memory sizing, indexes, and observability.
sidebar_position 10
pagination_prev
pagination_next
tags
data-accelerators
arrow
observability

Production operating guide for the Arrow in-memory data accelerator covering memory sizing, optional hash indexes, and observability.

Authentication & Secrets

The Arrow accelerator is an in-process, in-memory engine. There is no external storage and no authentication or secret management required.

Resilience & Durability

The Arrow accelerator is not durable. Data is held in RAM and is lost on process restart; every restart re-materializes the dataset from the source connector.

  • Crash recovery: None — on restart, the dataset is refreshed from scratch.
  • File modes: File-mode acceleration is rejected at startup; Arrow is memory-only. Use DuckDB, SQLite, PostgreSQL, or Cayenne when durability or spill is required.
  • Concurrency: Arrow reads are lock-free. Refresh cadence is controlled by the runtime refresh semaphore, not by the accelerator itself.

Capacity & Sizing

  • Memory: Plan for 1.0–1.5× the raw row-oriented size of the source data, plus overhead for string dictionaries. Use the source connector's schema and row count to estimate.
  • Hash index: Optional, disabled by default. When enabled via hash_index: enabled, a hash map is built over the primary-key columns. Build time scales linearly with rows; memory overhead is approximately 24–48 bytes per row plus the key size.
  • Startup cost: Full-dataset materialization happens on startup. For tables larger than ~1 GB, consider a durable accelerator to avoid repeated full refresh on every restart.

Metrics

Generic acceleration metrics are available with the dataset_acceleration_ prefix. Hash-index operations emit dedicated metrics when the index is enabled:

Metric Type Description
hash_index_builds Counter Total hash-index builds (one per refresh).
hash_index_build_duration_ms Histogram Time to build the hash index.
hash_index_entries Gauge Number of entries in the index.
hash_index_memory_bytes Gauge Approximate memory footprint of the index.
hash_index_lookups Counter Total hash-index lookups performed by queries.
hash_index_lookup_rows Counter Total rows returned via hash-index lookups.

See Component Metrics for enabling and exporting metrics. Refresh metrics are described in Acceleration.

Task History

Arrow acceleration operations (refresh, query) participate in task history through the shared acceleration spans (accelerated_table_refresh, sql_query). No Arrow-specific spans are emitted — the accelerator is a thin wrapper over Arrow memory.

Known Limitations

  • No persistence: Every restart refreshes from the source.
  • No traditional indexes: Arrow does not support B-tree indexes. Hash index provides point-lookup acceleration but not range or sort-order optimization.
  • Only primary-key hash index: The hash index requires a primary_key constraint; unique constraints alone do not enable the index.
  • Memory pressure: If the dataset exceeds available RAM, the runtime will OOM; no spill-to-disk mechanism exists in the Arrow accelerator itself.
  • partition_by: Not applicable — Arrow accelerator holds a single in-memory representation.

Troubleshooting

Symptom Likely cause Resolution
OOM on refresh Source dataset larger than RAM. Switch to a durable accelerator (DuckDB / SQLite / Cayenne) that supports spill to disk.
Long startup time Full-dataset refresh runs on boot. Switch to a durable accelerator so refresh is incremental, not full, on restart.
hash_index ignored No primary-key constraint on the dataset. Add primary_key: to the dataset definition; hash index activates automatically.
Query slow for point lookups Hash index disabled or wrong key column. Enable hash_index: enabled; ensure the query filter matches the primary-key columns.
Accelerator refuses to start with file mode Arrow rejects file-mode acceleration. Switch engine: to duckdb, sqlite, postgres, or cayenne.