Skip to content

Commit d1814cd

Browse files
authored
Update Glue and Iceberg documentation to include new parameters and permissions (#1383)
* Update Glue and Iceberg documentation to include new parameters and permissions * Remove Hybrid Deployment option from deployment architectures documentation * Apply suggestion from @lukekim * Add @docusaurus/theme-mermaid dependency to package.json * Apply suggestion from @lukekim * Remove Mermaid docs and plugin changes from lukim/iceberg
1 parent 34fb8f0 commit d1814cd

5 files changed

Lines changed: 41 additions & 34 deletions

File tree

website/docs/components/catalogs/glue.md

Lines changed: 17 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -44,12 +44,13 @@ Use the `include` field to specify which tables to include from the catalog. The
4444

4545
The following parameters are supported for configuring the connection to the Glue Data Catalog:
4646

47-
| Parameter Name | Definition |
48-
| -------------------- | --------------------------------------------------------------------------- |
49-
| `glue_region` | The AWS region for the Glue Data Catalog. E.g. `us-west-2`. |
50-
| `glue_key` | Access key (e.g. AWS_ACCESS_KEY_ID for AWS). If not provided, credentials will be loaded from environment variables or IAM roles. |
51-
| `glue_secret` | Secret key (e.g. AWS_SECRET_ACCESS_KEY for AWS). If not provided, credentials will be loaded from environment variables or IAM roles. |
52-
| `glue_session_token` | Session token (e.g. AWS_SESSION_TOKEN for AWS) for temporary credentials |
47+
| Parameter Name | Definition |
48+
| -------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
49+
| `glue_region` | The AWS region for the Glue Data Catalog. E.g. `us-west-2`. |
50+
| `glue_catalog_id` | The Glue catalog ID. For Amazon S3 Tables, use the format `<account_id>:s3tablescatalog/<table_bucket_name>`. If not provided, the default catalog for the account is used. |
51+
| `glue_key` | Access key (e.g. AWS_ACCESS_KEY_ID for AWS). If not provided, credentials will be loaded from environment variables or IAM roles. |
52+
| `glue_secret` | Secret key (e.g. AWS_SECRET_ACCESS_KEY for AWS). If not provided, credentials will be loaded from environment variables or IAM roles. |
53+
| `glue_session_token` | Session token (e.g. AWS_SESSION_TOKEN for AWS) for temporary credentials |
5354

5455
## Authentication
5556

@@ -124,7 +125,7 @@ The IAM role or user needs the following permissions to access Iceberg tables in
124125
},
125126
{
126127
"Effect": "Allow",
127-
"Action": ["s3:GetObject"],
128+
"Action": ["s3:GetObject", "s3:PutObject"],
128129
"Resource": "arn:aws:s3:::company-bucketname-datasets/*"
129130
},
130131
{
@@ -144,15 +145,16 @@ The IAM role or user needs the following permissions to access Iceberg tables in
144145

145146
### Permission Details
146147

147-
| Permission | Purpose |
148-
|------------|---------|
149-
| `s3:ListBucket` | Required. Allows scanning all objects from the bucket |
150-
| `s3:GetObject` | Required. Allows fetching objects |
151-
| `glue:GetCatalog` | Required. Retrieve metadata about the specified catalog. |
148+
| Permission | Purpose |
149+
| ------------------- | -------------------------------------------------------------- |
150+
| `s3:ListBucket` | Required. Allows scanning all objects from the bucket |
151+
| `s3:GetObject` | Required. Allows fetching objects |
152+
| `s3:PutObject` | Required for write operations. Allows writing objects |
153+
| `glue:GetCatalog` | Required. Retrieve metadata about the specified catalog. |
152154
| `glue:GetDatabases` | Required. List the databases available in the current catalog. |
153-
| `glue:GetDatabase` | Required. Retrieve metadata about the specified database. |
154-
| `glue:GetTable` | Required. Retrieve metadata about the specified table. |
155-
| `glue:GetTables` | Required. List the tables available in the current database. |
155+
| `glue:GetDatabase` | Required. Retrieve metadata about the specified database. |
156+
| `glue:GetTable` | Required. Retrieve metadata about the specified table. |
157+
| `glue:GetTables` | Required. List the tables available in the current database. |
156158

157159
## Limitations
158160

website/docs/components/data-connectors/glue.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,7 @@ The IAM role or user needs the following permissions to access Iceberg tables in
160160
},
161161
{
162162
"Effect": "Allow",
163-
"Action": ["s3:GetObject"],
163+
"Action": ["s3:GetObject", "s3:PutObject"],
164164
"Resource": "arn:aws:s3:::company-bucketname-datasets/*"
165165
},
166166
{
@@ -184,6 +184,7 @@ The IAM role or user needs the following permissions to access Iceberg tables in
184184
| ------------------- | -------------------------------------------------------------- |
185185
| `s3:ListBucket` | Required. Allows scanning all objects from the bucket |
186186
| `s3:GetObject` | Required. Allows fetching objects |
187+
| `s3:PutObject` | Required for write operations. Allows writing objects |
187188
| `glue:GetCatalog` | Required. Retrieve metadata about the specified catalog. |
188189
| `glue:GetDatabases` | Required. List the databases available in the current catalog. |
189190
| `glue:GetDatabase` | Required. Retrieve metadata about the specified database. |

website/docs/components/data-connectors/iceberg.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,7 @@ SELECT COUNT(*) FROM transactions;
105105
| `iceberg_token` | Bearer token value to use for Authorization header. |
106106
| `iceberg_oauth2_credential` | Credential to use for OAuth2 client credential flow when connecting to the table. Format: `<client_id>:<client_secret>` |
107107
| `iceberg_oauth2_scope` | Scope to use for OAuth2 client credential flow when connecting to the table. Default: `catalog` |
108+
| `iceberg_oauth2_token_url` | The URL to use for OAuth2 token endpoint. |
108109
| `iceberg_oauth2_server_url` | URL of the OAuth2 server tokens endpoint for the client credential flow. |
109110
| `iceberg_s3_endpoint` | S3-compatible endpoint where the Iceberg table data is stored. |
110111
| `iceberg_s3_region` | Region of the S3-compatible endpoint. |
@@ -201,7 +202,7 @@ The IAM role or user needs the following permissions to access Iceberg tables in
201202
},
202203
{
203204
"Effect": "Allow",
204-
"Action": ["s3:GetObject"],
205+
"Action": ["s3:GetObject", "s3:PutObject"],
205206
"Resource": "arn:aws:s3:::company-bucketname-datasets/*"
206207
},
207208
{
@@ -213,7 +214,7 @@ The IAM role or user needs the following permissions to access Iceberg tables in
213214
"glue:GetTable",
214215
"glue:GetTables"
215216
],
216-
Resource: "*"
217+
"Resource": "*"
217218
}
218219
]
219220
}
@@ -225,6 +226,7 @@ The IAM role or user needs the following permissions to access Iceberg tables in
225226
| ------------------- | -------------------------------------------------------------- |
226227
| `s3:ListBucket` | Required. Allows scanning all objects from the bucket |
227228
| `s3:GetObject` | Required. Allows fetching objects |
229+
| `s3:PutObject` | Required for write operations. Allows writing objects |
228230
| `glue:GetCatalog` | Required. Retrieve metadata about the specified catalog. |
229231
| `glue:GetDatabases` | Required. List the databases available in the current catalog. |
230232
| `glue:GetDatabase` | Required. Retrieve metadata about the specified database. |

website/docs/components/data-connectors/index.md

Lines changed: 17 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -90,17 +90,18 @@ datasets:
9090
9191
### Supported Formats
9292
93-
| Name | Parameter | Status | Description |
94-
| --------------------------------------------- | ---------------------- | ------- | --------------------------------------------- |
95-
| [Apache Parquet](https://parquet.apache.org/) | `file_format: parquet` | Stable | Columnar format optimized for analytics |
96-
| [CSV](../../reference/file_format#csv) | `file_format: csv` | Stable | Comma-separated values |
97-
| JSON | `file_format: json` | Roadmap | JavaScript Object Notation |
98-
| [Apache Iceberg](https://iceberg.apache.org/) | `file_format: iceberg` | Roadmap | Open table format for large analytic datasets |
99-
| Microsoft Excel | `file_format: xlsx` | Roadmap | Excel spreadsheet format |
100-
| Markdown | `file_format: md` | Stable | Plain text with formatting (document format) |
101-
| Text | `file_format: txt` | Stable | Plain text files (document format) |
102-
| PDF | `file_format: pdf` | Alpha | Portable Document Format (document format) |
103-
| Microsoft Word | `file_format: docx` | Alpha | Word document format (document format) |
93+
| Name | Parameter | Status | Description |
94+
| --------------------------------------------- | ---------------------- | ------- | -------------------------------------------------------------------------------------------------------------- |
95+
| [Apache Parquet](https://parquet.apache.org/) | `file_format: parquet` | Stable | Columnar format optimized for analytics |
96+
| [CSV](../reference/file_format#csv) | `file_format: csv` | Stable | Comma-separated values |
97+
| JSON | `file_format: json` | Stable | JavaScript Object Notation |
98+
| [Delta Lake](https://delta.io/) | `file_format: delta` | Stable | Open table format with ACID transactions. Object stores only. |
99+
| [Apache Iceberg](https://iceberg.apache.org/) | `file_format: iceberg` | Beta | Open table format for large analytic datasets. Object stores only. Requires a [catalog](../catalogs/index.md). |
100+
| Microsoft Excel | `file_format: xlsx` | Roadmap | Excel spreadsheet format |
101+
| Markdown | `file_format: md` | Stable | Plain text with formatting (document format) |
102+
| Text | `file_format: txt` | Stable | Plain text files (document format) |
103+
| PDF | `file_format: pdf` | Alpha | Portable Document Format (document format) |
104+
| Microsoft Word | `file_format: docx` | Alpha | Word document format (document format) |
104105

105106
### Format-Specific Parameters
106107

@@ -112,7 +113,7 @@ File formats support additional parameters for fine-grained control. Common exam
112113
| `csv_delimiter` | CSV | Field delimiter character (default: `,`) |
113114
| `csv_quote` | CSV | Quote character for fields containing delimiters |
114115

115-
For complete format options, see [File Formats Reference](../../reference/file_format).
116+
For complete format options, see [File Formats Reference](../reference/file_format).
116117

117118
### Applicable Connectors {#object-store-file-formats}
118119

@@ -160,9 +161,10 @@ Partition pruning improves query performance by reading only the relevant files.
160161
| Name | Parameter | Supported | Is Document Format |
161162
| --------------------------------------------- | ---------------------- | --------- | ------------------ |
162163
| [Apache Parquet](https://parquet.apache.org/) | `file_format: parquet` | ✅ | ❌ |
163-
| [CSV](../../reference/file_format#csv) | `file_format: csv` | ✅ | ❌ |
164-
| [Apache Iceberg](https://iceberg.apache.org/) | `file_format: iceberg` | Roadmap | ❌ |
165-
| JSON | `file_format: json` | Roadmap | ❌ |
164+
| [CSV](../reference/file_format#csv) | `file_format: csv` | ✅ | ❌ |
165+
| [Delta Lake](https://delta.io/) | `file_format: delta` | ✅ | ❌ |
166+
| [Apache Iceberg](https://iceberg.apache.org/) | `file_format: iceberg` | Beta | ❌ |
167+
| JSON | `file_format: json` | ✅ | ❌ |
166168
| Microsoft Excel | `file_format: xlsx` | Roadmap | ❌ |
167169
| Markdown | `file_format: md` | ✅ | ✅ |
168170
| Text | `file_format: txt` | ✅ | ✅ |

website/docs/reference/file_format.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ pagination_prev: 'reference/index'
66
pagination_next: null
77
---
88

9-
Spice currently supports CSV, JSON, and Parquet data file-formats for data connectors that can read files from a file system or cloud object storage (i.e. [`s3://`](../components/data-connectors/s3), [`abfs://`](../components/data-connectors/abfs), [`file://`](../components/data-connectors/file), etc.). Support for Iceberg and other file-formats are on the roadmap.
9+
Spice supports CSV, JSON, Parquet, Delta Lake, and Iceberg data file-formats for data connectors that can read files from a file system or cloud object storage (i.e. [`s3://`](../components/data-connectors/s3), [`abfs://`](../components/data-connectors/abfs), [`file://`](../components/data-connectors/file), etc.). Delta Lake and Iceberg are supported for object store connectors. Iceberg requires a catalog to be configured.
1010

1111
The parameters supported for specific file-formats are detailed on this page.
1212

0 commit comments

Comments
 (0)