Skip to content

docs: Document compressed file extension auto-detection#1726

Merged
lukekim merged 1 commit into
trunkfrom
docs/listing-connectors-compressed-file-detection
May 14, 2026
Merged

docs: Document compressed file extension auto-detection#1726
lukekim merged 1 commit into
trunkfrom
docs/listing-connectors-compressed-file-detection

Conversation

@lukekim
Copy link
Copy Markdown
Contributor

@lukekim lukekim commented May 14, 2026

Summary

Listing data connectors now auto-detect both file format and compression codec from compound file extensions such as .csv.gz, .jsonl.zst, .tsv.xz, and .json.bz2. No file_compression_type parameter is required for the recognized suffixes (.gz, .bz2, .xz, .zst).

Source PRs

Changes

  • website/docs/reference/file_format.md
    • Updated the Common Parameters table to note that file_extension accepts compound extensions and file_compression_type is now inferred when unset.
    • Added a new "Compressed File Extensions" section documenting the inference rules, a path-to-format/compression mapping table, and an example.

Test plan

  • cd website && npm run build passes locally
  • Compression suffix list (gz, bz2, xz, zst) verified against parse_extension_components / compression_from_extension in crates/runtime/src/dataconnector/listing/mod.rs
  • Format extension list verified against is_structured_format and default_jsonl_extension in the same PR

Listing data connectors (S3, ABFS, GCS, HTTP/HTTPS, FTP, SFTP, SMB, NFS,
file) now auto-detect both file format and compression codec from
compound extensions like `.csv.gz`, `.jsonl.zst`, `.tsv.xz`, `.json.bz2`.
No `file_compression_type` parameter is required for the recognized
suffixes (`.gz`, `.bz2`, `.xz`, `.zst`).

- Updates the Common Parameters table: `file_extension` and
  `file_compression_type` now mention compound-extension inference.
- Adds a new "Compressed File Extensions" section with the format/
  compression mapping and an example.

Reflects the behavior introduced in spiceai/spiceai#10809.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 14, 2026

✅ Pull with Spice Passed

Passing checks:

  • ✅ Title meets minimum length requirement (10 characters)
  • ✅ Has at least one of the required labels: area/blog, area/docs, area/cookbook, dependencies
  • ✅ No banned labels detected
  • ✅ Has at least one assignee: claudespice

@github-actions
Copy link
Copy Markdown

🔍 Pull with Spice Failed

Passing checks:

  • ✅ Title meets minimum length requirement (10 characters)
  • ✅ Has at least one of the required labels: area/blog, area/docs, area/cookbook, dependencies
  • ✅ No banned labels detected

Failed checks:

  • ❌ At least one assignee is required for this pull request.

Please address these issues and update your pull request.

@github-actions
Copy link
Copy Markdown

🚀 deployed to https://11aa3066.spiceai-org-website.pages.dev

@lukekim lukekim assigned claudespice and unassigned claudespice May 14, 2026
@lukekim lukekim merged commit 47b93e9 into trunk May 14, 2026
7 of 12 checks passed
@lukekim lukekim deleted the docs/listing-connectors-compressed-file-detection branch May 14, 2026 03:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants