Skip to content

Commit fc9ec71

Browse files
Add localpod docs (#565)
* Add localpod docs * docs: Add localpod to connectors table * docs: Re-order localpod position in table * wip --------- Co-authored-by: peasee <98815791+peasee@users.noreply.github.com>
1 parent 690aa8b commit fc9ec71

2 files changed

Lines changed: 40 additions & 0 deletions

File tree

spiceaidocs/docs/components/data-connectors/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ Currently supported Data Connectors include:
3131
| `ftp`, `sftp` | FTP/SFTP | Alpha | Parquet, CSV | `append`, `full` |||
3232
| `graphql` | GraphQL | Alpha | GraphQL | `append`, `full` |||
3333
| `http`, `https` | HTTP(s) | Alpha | Parquet, CSV | `append`, `full` |||
34+
| `localpod` | Local dataset replication | Alpha | | `append`, `full` |||
3435
| `mssql` | MS SQL Server | Alpha | Tabular Data Stream (TDS) | `append`, `full` |||
3536
| `sharepoint` | SharePoint | Alpha | | `append`, `full` |||
3637
| `snowflake` | Snowflake | Alpha | Arrow | `append`, `full` | Roadmap ||
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
---
2+
title: 'Localpod Data Connector'
3+
sidebar_label: 'Localpod Data Connector'
4+
description: 'Localpod Data Connector Documentation'
5+
pagination_prev: null
6+
---
7+
8+
The Localpod Data Connector enables setting up a parent/child relationship between datasets in the current Spicepod. This can be used for configuring multiple/tiered accelerations for a single dataset, and ensuring that the data is only downloaded once from the remote source. For example, you can use the `localpod` connector to create a child dataset that is accelerated in-memory, while the parent dataset is accelerated to a file.
9+
10+
The dataset created by the `localpod` connector will logically have the same data as the parent dataset.
11+
12+
## Synchronized Refreshes
13+
14+
The `localpod` connector supports synchronized refreshes, which ensures that the child dataset is refreshed from the same data as the parent dataset. Synchronized refreshes require that both the parent and child datasets are accelerated with `refresh_mode: full` (which is the default).
15+
16+
When synchronization is enabled, the following logs will be emitted:
17+
18+
```bash
19+
2024-10-28T15:45:24.220665Z INFO runtime::datafusion: Localpod dataset test_local synchronizing refreshes with parent table test
20+
```
21+
22+
### Examples
23+
24+
```yaml
25+
datasets:
26+
- from: postgres:cleaned_sales_data
27+
name: test
28+
params:
29+
...
30+
acceleration:
31+
enabled: true # This dataset will be accelerated into a DuckDB file
32+
engine: duckdb
33+
mode: file
34+
refresh_check_interval: 10s
35+
- from: localpod:test
36+
name: test_local
37+
acceleration:
38+
enabled: true # This dataset accelerates the parent `test` dataset into in-memory Arrow records and is synchronized with the parent
39+
```

0 commit comments

Comments
 (0)