You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|`full`| Replace/overwrite the entire dataset on each refresh | A table of users |
17
+
|`append`| Append/add data to the dataset on each refresh | Append-only, immutable datasets, such as time-series or log data |
18
+
|`changes`| Apply incremental changes | Customer order lifecycle table |
19
19
20
-
E.g.
20
+
Example:
21
21
22
22
```yaml
23
23
datasets:
@@ -28,7 +28,9 @@ datasets:
28
28
refresh_check_interval: 10m
29
29
```
30
30
31
-
If the dataset definition includes a `time_column` and the refresh mode is `append`, data will be refreshed for data where the `time_column` value in the remote source is greater-than (gt) the `max(time_column)` value in the local acceleration.
31
+
### Append
32
+
33
+
If the dataset definition includes a `time_column` and the refresh mode is `append`, data will be incrementally refreshed for data where the `time_column` value in the remote source is greater-than (gt) the `max(time_column)` value in the local acceleration.
32
34
33
35
E.g.
34
36
@@ -42,7 +44,9 @@ datasets:
42
44
refresh_check_interval: 10m
43
45
```
44
46
45
-
## Changes
47
+
When using `mode: append`, if late arriving data or clock-skew needs to be accounted for, an optional overlap can also be specified. See [`acceleration.refresh_append_overlap`](/reference/spicepod/datasets#accelerationrefresh_append_overlap).
48
+
49
+
### Changes (CDC)
46
50
47
51
Datasets configured with acceleration `refresh_mode: changes` require a [Change Data Capture (CDC)](/features/cdc/index.md) supported data connector. Initial CDC support in Spice is supported by the [Debezium data connector](/components/data-connectors/debezium.md).
48
52
@@ -57,7 +61,7 @@ Typically only a working subset of an entire dataset is used in an application o
57
61
58
62
Specify filters for data accelerated from the connected source using arbitrary SQL. Supported for `full` and `append` refresh modes.
59
63
60
-
Filters will be pushed down to the remote source, and only the requested data will be transferred over the network.
64
+
Filters will be pushed down to the remote source when possible, so only the requested data will be transferred over the network.
61
65
62
66
Example:
63
67
@@ -73,7 +77,7 @@ datasets:
73
77
SELECT * FROM accelerated_dataset WHERE city = 'Seattle'
74
78
```
75
79
76
-
The `refresh_sql` parameter can be updated at runtime on-demand using `PATCH /v1/datasets/:name/acceleration`. This change is temporary and will revert at the next runtime restart.
80
+
The `refresh_sql` parameter can be updated at runtime on-demand using `PATCH /v1/datasets/:name/acceleration`. This change is temporary and will revert to the `spicepod.yml` definition at the next runtime restart.
77
81
78
82
Example:
79
83
@@ -90,20 +94,20 @@ For the complete reference, view the `refresh_sql` section of [datasets](/refere
90
94
91
95
:::warning[Limitations]
92
96
93
-
- The refresh SQL only supports filtering data from the current dataset - joining across other datasets is not supported.
97
+
- Refresh SQL only supports filtering data from the current dataset - joining across other datasets is not supported.
94
98
- Selecting a subset of columns isn't supported - the refresh SQL needs to start with `SELECT * FROM {name}`.
95
-
- Queries for data that have been filtered out will not fall back to querying against the federated table.
99
+
- Queries for data that have been filtered out will not fallback to querying the federated table.
96
100
- Refresh SQL modifications made via API are temporary and will revert after a runtime restart.
97
101
98
102
:::
99
103
100
104
### Refresh Data Window
101
105
102
-
Filters data from the federated source outside than the specified window. The only supported window is a lookback starting from `now() - refresh_data_window` to `now()`. This flag is only supported for datasets configured with a `full` refresh mode (the default).
106
+
Filters data from the federated source that falls outside the specified time window. The only supported window is a lookback period starting from `now() - refresh_data_window` to `now()`. This flag is supported datasets configured with the default `full` refresh mode.
103
107
104
-
Used in combination with the [`time_column`](/reference/spicepod/datasets.md#time_column) to identify the column that contains the timestamps to filter on. The [`time_format`](/reference/spicepod/datasets.md#time_format) column (optional) can be used to instruct the Spice runtime how to interpret the timestamps in the `time_column`.
108
+
This filter works with the `time_column` to identify the column containing timestamps for filtering. Optionally, the `time_format`can be specified to instruct the Spice runtime on how to interpret timestamps in the `time_column`.
105
109
106
-
Can also be combined with `refresh_sql` to further filter the data based on the temporal dimension.
110
+
It can also be used alongside `refresh_sql` to apply additional filtering based on time-related criteria.
107
111
108
112
Example:
109
113
@@ -243,9 +247,11 @@ Retention policies apply to `full` and `append` refresh modes (not `changes`).
243
247
244
248
The policy is set using the [`acceleration.retention_check_enabled`](/reference/spicepod/datasets#accelerationretention_check_enabled), [`acceleration.retention_period`](/reference/spicepod/datasets#accelerationretention_period) and [`acceleration.retention_check_interval`](/reference/spicepod/datasets#accelerationretention_check_interval) parameters, along with the [`time_column`](/reference/spicepod/datasets#time_column) and [`time_format`](/reference/spicepod/datasets#time_format) dataset parameters.
245
249
246
-
247
250
## Refresh Jitter
248
-
Accelerated datasets can be configured to add a random jitter to the refresh interval. This can be useful to avoid a thundering herd problem where multiple datasets are refreshed at the same time. The jitter is added or subtracted from the refresh interval and is between 0 and `refresh_jitter_max`.
251
+
252
+
Accelerated datasets can include a random jitter in the refresh interval to prevent the [Thundering herd problem](https://en.wikipedia.org/wiki/Thundering_herd_problem), where multiple datasets refresh simultaneously. The jitter, ranging from `0` to `refresh_jitter_max`, is randomly added or subtracted from the refresh interval.
253
+
254
+
Refresh Jitter applies on the first dataset load, so on a restart of multiple similarily configured Spice instances at once, on restart they will load with jitter of 0 to `refresh_jitter_max`.
249
255
250
256
Example:
251
257
@@ -262,5 +268,6 @@ datasets:
262
268
In this example, the refresh interval will be between 9s and 11s.
263
269
264
270
Refresh jitter can be configured using the following parameters:
0 commit comments