docs: Comprehensive partitioning, partition transforms, and pruning rules by claudespice · Pull Request #1652 · spiceai/docs

claudespice · 2026-05-05T17:48:32Z

Summary

Rewrite the partitioning page with comprehensive coverage of how partition_by actually works in the runtime. Adds documentation for partition transforms (bucket, truncate, date_part, date_trunc, modulo), composite partitioning, the partition pruning matrix per filter shape, engine-specific behavior, and validation rules. Also fixes two inaccuracies in the scalar_functions reference for bucket and truncate.

Changes

`website/docs/features/data-acceleration/partitioning.md` (rewritten)

Sections added:

How partitioning works — refresh-time routing, query-time pruning, when to use it
Configuration — exact YAML placement of partition_by (under acceleration:) and partition_mode (under acceleration.params:); supported engines table (arrow, duckdb files/tables, cayenne)
Partition transforms — bucket(num_buckets, column), truncate(width, value), date_part(...), date_trunc(...), modulo, plain column reference; each with valid types and example YAML
Composite partitioning — multi-expression support on Arrow + Cayenne; DuckDB single-expression-only restriction
Partition pruning — full matrix of which filter shapes prune which transforms (equality, IN/NOT IN, range filters, OR-chains, AND across composite partitions); bucket() inequality pruning bounded-range-only rule; date_part() filter pruning not-yet-implemented note
Engine-specific behavior — Arrow MemTable, DuckDB files-mode (Hive layout), DuckDB tables-mode, Cayenne (SQLite metadata catalog)
Schema evolution — PartitionByExpressionsChanged rejection and the manual re-partition workflow
Validation rules — types accepted, single-column requirement, prohibited expression shapes
Examples — high-cardinality bucketing, date-based partitioning, composite year/month, truncate, plain column

`website/docs/reference/sql/scalar_functions.md`

Correct bucket return type: it matches the num_buckets literal type (Int8…Int64 / UInt8…UInt64), not always Int64. Document the 1..=1_000_000 valid range and fixed-seed determinism.
Correct truncate accepted types: also accepts Utf8 (strings) and Binary for prefix truncation, in addition to integers and decimals. Add a string example.

`website/docs/features/data-acceleration/index.md`

Link to the new partitioning page from the benefits intro.

`website/src/partials/deployment/architectures/_sharded.mdx`

Add a "Sharding vs. partitioning" section distinguishing deployment-level sharding (multiple Spice runtimes) from acceleration-level partitioning (one acceleration split into partitions). Includes a comparison table.

Verification

All claims verified against the spiceai/spiceai source:

bucket return type and MAX_NUM_BUCKETS (crates/runtime-datafusion-udfs/src/bucket.rs:121-133, :35)
truncate accepted types (crates/runtime-datafusion-udfs/src/truncate.rs:135-148)
partition_by YAML deserialization syntax (crates/spicepod/src/partitioning.rs:38-90)
Engine selection per partition_by and partition_mode (crates/runtime/src/component/dataset/acceleration.rs:418-433, crates/runtime/src/dataaccelerator/partitioned_duckdb.rs:69-99)
Partition pruning supported filter shapes (crates/runtime-table-partition/src/provider/pruning.rs:228-600)
DuckDB single-expression restriction (crates/runtime/src/dataaccelerator/partitioned_duckdb.rs:157-160)
Schema-evolution rejection error (crates/runtime-table-partition/src/creator.rs:49-52)
Real spicepod example placement (test/spicepods/tpch/sf1/accelerated/s3[parquet]-duckdb[file]-partitioned.yaml)

Test plan

Verified parameter names, defaults, and accepted types against Rust source
Pruning matrix entries match pruning.rs evaluation rules
Cross-links resolve
Markdown renders correctly

…ules Rewrite the partitioning page with full coverage of supported partition transforms (bucket, truncate, date_part, date_trunc, modulo), composite partitioning, pruning matrix per filter shape, engine-specific behavior (Arrow / DuckDB files-mode / DuckDB tables-mode / Cayenne), validation rules, and worked examples. Verified against the Rust source: bucket return type matches the num_buckets literal (Int8..Int64 / UInt8..UInt64) — not always Int64 as previously documented in scalar_functions.md. Truncate accepts Utf8 and Binary in addition to numeric types. date_part filter pruning is not yet implemented; equality on the partition expression still prunes correctly. Also: - Cross-link from data-acceleration index to the new partitioning page - Distinguish deployment-level sharding from acceleration partitioning in the sharded architecture page - Correct bucket return type and truncate input type list in scalar_functions.md

github-actions · 2026-05-05T17:48:41Z

✅ Pull with Spice Passed

Passing checks:

✅ Title meets minimum length requirement (10 characters)
✅ Has at least one of the required labels: area/blog, area/docs, area/cookbook, dependencies
✅ No banned labels detected
✅ Has at least one assignee: claudespice

github-actions · 2026-05-05T17:48:41Z

🔍 Pull with Spice Failed

Passing checks:

✅ Title meets minimum length requirement (10 characters)
✅ Has at least one of the required labels: area/blog, area/docs, area/cookbook, dependencies
✅ No banned labels detected

Failed checks:

❌ At least one assignee is required for this pull request.

Please address these issues and update your pull request.

github-actions · 2026-05-05T17:55:01Z

🚀 deployed to https://b853f82c.spiceai-org-website.pages.dev

claudespice added the area/docs label May 5, 2026

claudespice self-assigned this May 5, 2026

lukekim approved these changes May 5, 2026

View reviewed changes

lukekim enabled auto-merge (rebase) May 5, 2026 17:51

github-actions Bot deployed to preview May 5, 2026 17:54 View deployment

lukekim merged commit 18ad528 into trunk May 5, 2026
6 of 8 checks passed

lukekim deleted the docs/comprehensive-partitioning branch May 5, 2026 17:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Comprehensive partitioning, partition transforms, and pruning rules#1652

docs: Comprehensive partitioning, partition transforms, and pruning rules#1652
lukekim merged 1 commit into
trunkfrom
docs/comprehensive-partitioning

claudespice commented May 5, 2026

Uh oh!

github-actions Bot commented May 5, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

claudespice commented May 5, 2026

Summary

Changes

website/docs/features/data-acceleration/partitioning.md (rewritten)

website/docs/reference/sql/scalar_functions.md

website/docs/features/data-acceleration/index.md

website/src/partials/deployment/architectures/_sharded.mdx

Verification

Test plan

Uh oh!

github-actions Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Pull with Spice Passed

Passing checks:

Uh oh!

github-actions Bot commented May 5, 2026

🔍 Pull with Spice Failed

Passing checks:

Failed checks:

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`website/docs/features/data-acceleration/partitioning.md` (rewritten)

`website/docs/reference/sql/scalar_functions.md`

`website/docs/features/data-acceleration/index.md`

`website/src/partials/deployment/architectures/_sharded.mdx`

github-actions Bot commented May 5, 2026 •

edited

Loading