Skip to content

Commit 574eb90

Browse files
committed
data-refresh: add combined incremental ingestion example (overlap, retention, soft deletes)
1 parent 594adeb commit 574eb90

1 file changed

Lines changed: 33 additions & 0 deletions

File tree

website/docs/features/data-acceleration/data-refresh.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -544,6 +544,39 @@ datasets:
544544

545545
:::
546546

547+
### End-to-End Incremental Ingestion Example
548+
549+
The following example combines the pieces above into a single configuration for keeping an accelerated dataset incrementally up-to-date from a source that supports soft deletes:
550+
551+
- `refresh_mode: append` with a `time_column` for incremental queries
552+
- `refresh_check_interval` to poll for new/changed rows
553+
- `refresh_append_overlap` to tolerate clock skew and late-arriving rows without missing data
554+
- `primary_key` + `on_conflict: upsert` so rows updated in the source overwrite the accelerated copy instead of duplicating
555+
- `retention_period` to bound the working set by time
556+
- `retention_sql` to evict soft-deleted rows (`deleted_at IS NOT NULL`)
557+
558+
```yaml
559+
datasets:
560+
- from: postgres:public.orders
561+
name: orders
562+
time_column: updated_at
563+
acceleration:
564+
enabled: true
565+
engine: duckdb
566+
refresh_mode: append
567+
refresh_check_interval: 1m
568+
refresh_append_overlap: 5m
569+
primary_key: id
570+
on_conflict:
571+
id: upsert
572+
retention_check_enabled: true
573+
retention_check_interval: 10m
574+
retention_period: 90d
575+
retention_sql: DELETE FROM orders WHERE deleted_at IS NOT NULL
576+
```
577+
578+
With this configuration Spice bootstraps from the source, then every minute fetches rows where `updated_at > max(updated_at) - 5m`, upserting on `id`. Rows older than 90 days — or rows the source has soft-deleted — are evicted on the retention check.
579+
547580
## Refresh Jitter
548581

549582
| | |

0 commit comments

Comments
 (0)