You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: website/versioned_docs/version-1.11.x/components/data-connectors/index.md
+85Lines changed: 85 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -177,6 +177,91 @@ SELECT * FROM partitioned_data WHERE year = '2024' AND month = '01';
177
177
178
178
Partition pruning improves query performance by reading only the relevant files.
179
179
180
+
### Metadata Columns
181
+
182
+
File-based connectors can expose per-file object store metadata as virtual columns in the dataset schema. These columns are not stored in the data files — they are derived from object store file metadata at query time.
| `location` | `Utf8` | Full URI of the source file |
189
+
| `last_modified` | `Timestamp(µs, "UTC")` | When the file was last modified |
190
+
| `size` | `UInt64` | File size in bytes |
191
+
192
+
#### Enabling Metadata Columns
193
+
194
+
Metadata columns are enabled by adding a `metadata` section to the dataset definition with each desired column set to `enabled`:
195
+
196
+
```yaml
197
+
datasets:
198
+
- from: s3://bucket/data/
199
+
name: my_data
200
+
params:
201
+
file_format: parquet
202
+
metadata:
203
+
location: enabled
204
+
last_modified: enabled
205
+
size: enabled
206
+
```
207
+
208
+
Each column can be individually enabled or omitted:
209
+
210
+
```yaml
211
+
metadata:
212
+
location: enabled # Only add the location column
213
+
```
214
+
215
+
:::note
216
+
If the data files already contain a column with the same name as a metadata column (e.g., a Parquet file with a `size` column), the metadata column is not added to avoid conflicts.
217
+
:::
218
+
219
+
#### Querying Metadata Columns
220
+
221
+
Once enabled, metadata columns appear alongside the regular data columns:
Spice infers the schema for each dataset from its data source at startup. The inferred schema defines the column names, data types, and nullability used by the dataset for the lifetime of that runtime process.
Copy file name to clipboardExpand all lines: website/versioned_docs/version-1.11.x/components/data-connectors/s3.md
+83Lines changed: 83 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -262,6 +262,89 @@ Use `schema_source_path` to speed up dataset registration by specifying a URL to
262
262
schema_source_path: s3://spiceai-demo-datasets/taxi_trips/2014/1/trips_01.parquet # or s3://spiceai-demo-datasets/taxi_trips/2014/1/
263
263
```
264
264
265
+
### Metadata Columns Example
266
+
267
+
Metadata columns expose per-file S3 object metadata (`location`, `last_modified`, `size`) as virtual columns in query results. See [Metadata Columns](./#metadata-columns) for full details.
Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](../secret-stores/). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](../secret-stores/#using-secrets).
0 commit comments