You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add a new reference section for `acceleration.storage_profile` covering
the `auto` / `local_ssd` / `ebs` / `tmpfs` values and their aliases. This
field tunes connection pool sizing, checkpoint thresholds, and per-file
size defaults for file-mode accelerators (DuckDB, SQLite, Turso, and
Spice Cayenne) based on the underlying storage medium.
Source: spiceai/spiceai#10913
Copy file name to clipboardExpand all lines: website/docs/reference/spicepod/datasets.md
+25Lines changed: 25 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -365,6 +365,31 @@ Optional. The mode of acceleration. The following values are supported:
365
365
- `file_create`- Always create a new acceleration file on startup, removing any existing file. When [snapshots](../../features/data-acceleration/snapshots) are enabled, the existing file is snapshotted before deletion. Supported for Spice Cayenne (`cayenne`), `duckdb`, `sqlite`, and `turso` acceleration engines.
366
366
- `file_update`- Open an existing acceleration file if it exists, then check schema compatibility on refresh. If the source schema change is additive (new columns only), the existing file is kept. If the schema change is incompatible (columns removed, renamed, or type changed), the file is snapshotted (if [snapshots](../../features/data-acceleration/snapshots) are enabled) and recreated from scratch. Supported for Spice Cayenne (`cayenne`), `duckdb`, `sqlite`, and `turso` acceleration engines.
367
367
368
+
## `acceleration.storage_profile`
369
+
370
+
Optional. The storage profile for file-backed acceleration. The runtime uses this hint to tune connection-pool sizing, checkpoint thresholds, and file-size defaults for the underlying medium. Only applies to file-mode accelerators (`duckdb`, `sqlite`, `turso`, and Spice Cayenne); memory-mode accelerators ignore this setting.
371
+
372
+
Supported values:
373
+
374
+
- `auto`(default) – Detect the storage profile from the resolved acceleration file path. On Linux, detection reads `/proc/self/mountinfo` and inspects block-device metadata to recognize Amazon EBS, Azure Managed Disks, Amazon EC2 NVMe instance storage, `tmpfs`/`ramfs`, and generic NVMe/SSD. On other platforms, detection returns unknown and the engine defaults apply.
375
+
- `local_ssd` (aliases: `ssd`, `nvme`) – Treat the acceleration file location as local SSD/NVMe (for example, EC2 instance store or Azure temporary/NVMe local storage). Uses the engine defaults for connection pool size and checkpoint thresholds.
376
+
- `ebs` (aliases: `azure_disk`, `managed_disk`, `network_disk`) – Treat the acceleration file location as network-attached block storage (for example, Amazon EBS or Azure Managed Disks). Reduces connection-pool size and raises DuckDB's checkpoint threshold so per-IO latency is amortized across larger flushes. Spice Cayenne uses smaller per-file targets to reduce write amplification.
377
+
- `tmpfs` (aliases: `ram`, `ramdisk`, `ramfs`, `memory`) – Treat the acceleration file location as RAM-backed storage. Raises DuckDB's checkpoint threshold so steady-state workloads don't pay checkpoint cost on small amounts of dirty data; Spice Cayenne uses larger per-file targets to improve scan throughput.
378
+
379
+
Example:
380
+
381
+
```yaml
382
+
datasets:
383
+
- from: s3://bucket/data/
384
+
name: analytics
385
+
acceleration:
386
+
engine: duckdb
387
+
mode: file
388
+
storage_profile: ebs
389
+
params:
390
+
duckdb_file: /mnt/ebs/analytics.db
391
+
```
392
+
368
393
## `acceleration.snapshots`
369
394
370
395
Optional. Controls how this dataset participates in managed acceleration snapshots. Requires the Spicepod to configure the top-level [`snapshots` block](.#snapshots), the acceleration engine to be `duckdb`, `sqlite`, `cayenne`, or `turso`, and `mode: file` with a dataset-specific file path (for example `acceleration.params.duckdb_file: /nvme/my_dataset.db`).
0 commit comments