spiceai
diff --git a/‎website/docs/components/data-connectors/index.md‎
Lines changed: 16 additions & 16 deletions b/‎website/docs/components/data-connectors/index.md‎
Lines changed: 16 additions & 16 deletions
diff --git a/‎website/docs/reference/file_format.md‎
Lines changed: 57 additions & 11 deletions b/‎website/docs/reference/file_format.md‎
Lines changed: 57 additions & 11 deletions
@@ -55,7 +55,7 @@ Supported Data Connectors include:
 | `odbc`                             | ODBC                                  | Beta              | ODBC                         |
 | `snowflake`                        | Snowflake                             | Beta              | Arrow                        |
 | `spark`                            | Spark                                 | Beta              | [Spark Connect][spark]       |
-| `iceberg`                          | [Apache Iceberg][iceberg]             | Beta              | Parquet                      |
+| `iceberg`                          | [Apache Iceberg][iceberg]             | Stable            | Parquet                      |
 | `abfs`                             | Azure BlobFS                          | Alpha             | Parquet, CSV, JSON           |
 | `ftp`, `sftp`                      | FTP/SFTP                              | Alpha             | Parquet, CSV, JSON           |
 | `smb`                              | SMB                                   | Alpha             | Parquet, CSV, JSON           |
@@ -117,11 +117,11 @@ datasets:
 | [CSV](../reference/file_format#csv)           | `file_format: csv`     | Stable  | Comma-separated values                                                                                         |
 | JSON                                          | `file_format: json`    | Stable  | JavaScript Object Notation                                                                                     |
 | [Delta Lake](https://delta.io/)               | `file_format: delta`   | Stable  | Open table format with ACID transactions. Object stores only.                                                  |
-| [Apache Iceberg](https://iceberg.apache.org/) | `file_format: iceberg` | Beta    | Open table format for large analytic datasets. Object stores only. Requires a [catalog](../catalogs/index.md). |
+| [Apache Iceberg](https://iceberg.apache.org/) | `file_format: iceberg` | Stable  | Open table format for large analytic datasets. Object stores only. Requires a [catalog](../catalogs/index.md). |
 | Microsoft Excel                               | `file_format: xlsx`    | Roadmap | Excel spreadsheet format                                                                                       |
 | Markdown                                      | `file_format: md`      | Stable  | Plain text with formatting (document format)                                                                   |
 | Text                                          | `file_format: txt`     | Stable  | Plain text files (document format)                                                                             |
-| PDF                                           | `file_format: pdf`     | Alpha   | Portable Document Format (document format)                                                                     |
+| PDF                                           | `file_format: pdf`     | Beta    | Portable Document Format (document format)                                                                     |
 | Microsoft Word                                | `file_format: docx`    | Alpha   | Word document format (document format)                                                                         |
 
 ### Format-Specific Parameters
@@ -186,11 +186,11 @@ File-based connectors can expose per-file object store metadata as virtual colum
 
 #### Available Columns
 
-| Column           | Type                   | Description                        |
-| ---------------- | ---------------------- | ---------------------------------- |
-| `_location`      | `Utf8`                 | Full URI of the source file        |
-| `_last_modified` | `Timestamp(µs, "UTC")` | When the file was last modified    |
-| `_size`          | `UInt64`               | File size in bytes                 |
+| Column           | Type                   | Description                     |
+| ---------------- | ---------------------- | ------------------------------- |
+| `_location`      | `Utf8`                 | Full URI of the source file     |
+| `_last_modified` | `Timestamp(µs, "UTC")` | When the file was last modified |
+| `_size`          | `UInt64`               | File size in bytes              |
 
 #### Enabling Metadata Columns
 
@@ -259,11 +259,11 @@ ORDER BY _location;
 
 Metadata columns are supported by all file-based connectors:
 
-| Connector Type               | Connectors                             |
-| ---------------------------- | -------------------------------------- |
-| **Object Stores**            | S3, Azure Blob (ABFS), HTTP/HTTPS      |
-| **Network-Attached Storage** | FTP, SFTP, SMB, NFS                    |
-| **Local Storage**            | File                                   |
+| Connector Type               | Connectors                        |
+| ---------------------------- | --------------------------------- |
+| **Object Stores**            | S3, Azure Blob (ABFS), HTTP/HTTPS |
+| **Network-Attached Storage** | FTP, SFTP, SMB, NFS               |
+| **Local Storage**            | File                              |
 
 ## Schema Inference
 
@@ -304,12 +304,12 @@ Runtime schema evolution controls are planned for a future release. When availab
 | [Apache Parquet](https://parquet.apache.org/) | `file_format: parquet` | ✅         | ❌                  |
 | [CSV](../reference/file_format#csv)           | `file_format: csv`     | ✅         | ❌                  |
 | [Delta Lake](https://delta.io/)               | `file_format: delta`   | ✅         | ❌                  |
-| [Apache Iceberg](https://iceberg.apache.org/) | `file_format: iceberg` | Beta      | ❌                  |
+| [Apache Iceberg](https://iceberg.apache.org/) | `file_format: iceberg` | ✅         | ❌                  |
 | JSON                                          | `file_format: json`    | ✅         | ❌                  |
 | Microsoft Excel                               | `file_format: xlsx`    | Roadmap   | ❌                  |
 | Markdown                                      | `file_format: md`      | ✅         | ✅                  |
 | Text                                          | `file_format: txt`     | ✅         | ✅                  |
-| PDF                                           | `file_format: pdf`     | Alpha     | ✅                  |
+| PDF                                           | `file_format: pdf`     | Beta      | ✅                  |
 | Microsoft Word                                | `file_format: docx`    | Alpha     | ✅                  |
 
 ### Document Formats {#document-formats}
@@ -320,7 +320,7 @@ Runtime schema evolution controls are planned for a future release. When availab
 Document formats (Markdown, Text, PDF, Word) are handled differently from structured data formats. Each file becomes a row in the resulting table, with the file contents stored in a `content` column.
 
 :::warning[Note]
-Document formats in Alpha (PDF, DOCX) may not parse all structure or text from the underlying documents correctly.
+Document formats in Alpha (DOCX) may not parse all structure or text from the underlying documents correctly.
 :::
 
 #### Document Table Schema
 
@@ -6,15 +6,39 @@ pagination_prev: 'reference/index'
 pagination_next: null
 ---
 
-Spice supports CSV, JSON, Parquet, Delta Lake, and Iceberg data file-formats for data connectors that can read files from a file system or cloud object storage (i.e. [`s3://`](../components/data-connectors/s3), [`abfs://`](../components/data-connectors/abfs), [`file://`](../components/data-connectors/file), etc.). Delta Lake and Iceberg are supported for object store connectors. Iceberg requires a catalog to be configured.
+File-based [data connectors](../components/data-connectors/index.md) — including [`s3://`](../components/data-connectors/s3), [`abfs://`](../components/data-connectors/abfs), [`file://`](../components/data-connectors/file), [`ftp://`](../components/data-connectors/ftp), [`sftp://`](../components/data-connectors/ftp), and others — support multiple structured and document file formats. This page details the format-specific parameters available for each.
 
-The parameters supported for specific file-formats are detailed on this page.
+## Common Parameters
+
+These parameters apply across multiple file formats.
+
+| Parameter                   | Type    | Default        | Description                                                                                                                   |
+| --------------------------- | ------- | -------------- | ----------------------------------------------------------------------------------------------------------------------------- |
+| `file_format`               | String  | Inferred       | Selects the file reader. If omitted, format is inferred from the file extension. See [Supported Formats](#supported-formats). |
+| `file_extension`            | String  | Derived        | Overrides the file extension filter used when listing files. Defaults to the extension matching the resolved format.          |
+| `schema_infer_max_records`  | Integer | `1000`         | Maximum number of records scanned to infer the schema.                                                                        |
+| `file_compression_type`     | String  | `UNCOMPRESSED` | File-level compression for CSV, TSV, and JSON files. Valid values: `GZIP`, `BZIP2`, `XZ`, `ZSTD`, `UNCOMPRESSED`.             |
+| `hive_partitioning_enabled` | Boolean | `false`        | Enables Hive-style partition discovery from directory structure.                                                              |
+
+## Supported Formats {#supported-formats}
+
+The `file_format` parameter accepts these values:
+
+| Value     | Reader              | Default Extension | Notes                                       |
+| --------- | ------------------- | ----------------- | ------------------------------------------- |
+| `parquet` | Apache Parquet      | `.parquet`        |                                             |
+| `csv`     | CSV                 | `.csv`            | Uses `csv_*` parameters.                    |
+| `tsv`     | TSV (tab-delimited) | `.tsv`            | Uses `tsv_*` parameters. Delimiter is tab.  |
+| `json`    | JSON                | `.json`           | Uses `json_format` to control parsing mode. |
+| `jsonl`   | JSON Lines          | `.jsonl`          | Line-delimited JSON.                        |
+
+When `file_format` is omitted, Spice infers the format from the dataset path extension. If the extension does not match one of the values above, a configuration error is returned.
 
 ## Parquet
 
-Spice automatically supports reading any Parquet file, regardless of the compression codec or data encoding used.
+Spice reads any Parquet file regardless of the compression codec or data encoding.
 
-Compression codecs:
+Supported compression codecs:
 
 - [`UNCOMPRESSED`](https://parquet.apache.org/docs/file-format/data-pages/compression/#uncompressed)
 - [`SNAPPY`](https://parquet.apache.org/docs/file-format/data-pages/compression/#snappy)
@@ -25,7 +49,7 @@ Compression codecs:
 - [`LZ4_RAW`](https://parquet.apache.org/docs/file-format/data-pages/compression/#lz4_raw)
 - [`ZSTD`](https://parquet.apache.org/docs/file-format/data-pages/compression/#zstd)
 
-Data encodings:
+Supported data encodings:
 
 - [`PLAIN`](https://parquet.apache.org/docs/file-format/data-pages/encodings/#plain-plain--0)
 - [`PLAIN_DICTIONARY` / `RLE_DICTIONARY`](https://parquet.apache.org/docs/file-format/data-pages/encodings/#dictionary-encoding-plain_dictionary--2-and-rle_dictionary--8)
@@ -38,16 +62,38 @@ Data encodings:
 
 ## CSV
 
+### Parameters {#csv}
+
+| Parameter                      | Type    | Default | Description                                                                                           |
+| ------------------------------ | ------- | ------- | ----------------------------------------------------------------------------------------------------- |
+| `csv_has_header`               | Boolean | `true`  | Whether the first row contains column headers.                                                        |
+| `csv_quote`                    | Char    | `"`     | Character used to quote fields containing special characters.                                         |
+| `csv_escape`                   | Char    | _none_  | Character used to escape special characters within a field.                                           |
+| `csv_delimiter`                | Char    | `,`     | Character used to separate fields.                                                                    |
+| `csv_schema_infer_max_records` | Integer | `1000`  | **Deprecated.** Use `schema_infer_max_records` instead. Maximum records scanned for schema inference. |
+
+## TSV
+
+TSV (tab-separated values) is a first-class format. Set `file_format: tsv` or use a `.tsv` file extension. The delimiter is always tab and cannot be changed.
+
 ### Parameters
 
-- `csv_has_header`: Optional. Indicate if the CSV file has header row. Defaults to `true`
-- `csv_quote`: Optional. A one-character string used to quote fields containing special characters. Defaults to `"`
-- `csv_escape`: Optional. A one-character string used to represent special characters or to include characters that would normally be interpreted as delimiters or new line characters within a field value. Defaults to `null`
-- `csv_schema_infer_max_records`: Optional. A number used to set the limit in terms of records to scan to infer the schema. Defaults to `1000`
-- `csv_delimiter`: Optional. A one-character string used to separate individual fields. Defaults to `,`
+| Parameter                      | Type    | Default | Description                                                                                           |
+| ------------------------------ | ------- | ------- | ----------------------------------------------------------------------------------------------------- |
+| `tsv_has_header`               | Boolean | `true`  | Whether the first row contains column headers.                                                        |
+| `tsv_quote`                    | Char    | `"`     | Character used to quote fields containing special characters.                                         |
+| `tsv_escape`                   | Char    | _none_  | Character used to escape special characters within a field.                                           |
+| `tsv_schema_infer_max_records` | Integer | `1000`  | **Deprecated.** Use `schema_infer_max_records` instead. Maximum records scanned for schema inference. |
 
 ## JSON
 
+Set `file_format: json` for JSON files. Use the `json_format` parameter to select the parsing mode.
+
 ### Parameters
 
-- `json_format`: Optional. Specifies the JSON format to parse. Valid values are `array`, `ndjson`, and `jsonl`. Defaults to `jsonl`
+| Parameter      | Type    | Default | Description                                                                                                       |
+| -------------- | ------- | ------- | ----------------------------------------------------------------------------------------------------------------- |
+| `json_format`  | String  | `jsonl` | Parsing mode. `jsonl`, `ndjson`, and `ldjson` produce line-delimited JSON. `array` parses a top-level JSON array. |
+| `flatten_json` | Boolean | `false` | When `true`, nested JSON objects are flattened with `.` as a separator (e.g., `address.city`).                    |
+
+Setting `file_format: jsonl` uses the DataFusion JSON Lines reader directly, without `json_format` or `flatten_json` support.