Skip to content

docs: Comprehensive partitioning, partition transforms, and pruning rules#1652

Merged
lukekim merged 1 commit into
trunkfrom
docs/comprehensive-partitioning
May 5, 2026
Merged

docs: Comprehensive partitioning, partition transforms, and pruning rules#1652
lukekim merged 1 commit into
trunkfrom
docs/comprehensive-partitioning

Conversation

@claudespice
Copy link
Copy Markdown
Collaborator

Summary

Rewrite the partitioning page with comprehensive coverage of how partition_by actually works in the runtime. Adds documentation for partition transforms (bucket, truncate, date_part, date_trunc, modulo), composite partitioning, the partition pruning matrix per filter shape, engine-specific behavior, and validation rules. Also fixes two inaccuracies in the scalar_functions reference for bucket and truncate.

Changes

website/docs/features/data-acceleration/partitioning.md (rewritten)

Sections added:

  • How partitioning works — refresh-time routing, query-time pruning, when to use it
  • Configuration — exact YAML placement of partition_by (under acceleration:) and partition_mode (under acceleration.params:); supported engines table (arrow, duckdb files/tables, cayenne)
  • Partition transformsbucket(num_buckets, column), truncate(width, value), date_part(...), date_trunc(...), modulo, plain column reference; each with valid types and example YAML
  • Composite partitioning — multi-expression support on Arrow + Cayenne; DuckDB single-expression-only restriction
  • Partition pruning — full matrix of which filter shapes prune which transforms (equality, IN/NOT IN, range filters, OR-chains, AND across composite partitions); bucket() inequality pruning bounded-range-only rule; date_part() filter pruning not-yet-implemented note
  • Engine-specific behavior — Arrow MemTable, DuckDB files-mode (Hive layout), DuckDB tables-mode, Cayenne (SQLite metadata catalog)
  • Schema evolutionPartitionByExpressionsChanged rejection and the manual re-partition workflow
  • Validation rules — types accepted, single-column requirement, prohibited expression shapes
  • Examples — high-cardinality bucketing, date-based partitioning, composite year/month, truncate, plain column

website/docs/reference/sql/scalar_functions.md

  • Correct bucket return type: it matches the num_buckets literal type (Int8Int64 / UInt8UInt64), not always Int64. Document the 1..=1_000_000 valid range and fixed-seed determinism.
  • Correct truncate accepted types: also accepts Utf8 (strings) and Binary for prefix truncation, in addition to integers and decimals. Add a string example.

website/docs/features/data-acceleration/index.md

  • Link to the new partitioning page from the benefits intro.

website/src/partials/deployment/architectures/_sharded.mdx

  • Add a "Sharding vs. partitioning" section distinguishing deployment-level sharding (multiple Spice runtimes) from acceleration-level partitioning (one acceleration split into partitions). Includes a comparison table.

Verification

All claims verified against the spiceai/spiceai source:

  • bucket return type and MAX_NUM_BUCKETS (crates/runtime-datafusion-udfs/src/bucket.rs:121-133, :35)
  • truncate accepted types (crates/runtime-datafusion-udfs/src/truncate.rs:135-148)
  • partition_by YAML deserialization syntax (crates/spicepod/src/partitioning.rs:38-90)
  • Engine selection per partition_by and partition_mode (crates/runtime/src/component/dataset/acceleration.rs:418-433, crates/runtime/src/dataaccelerator/partitioned_duckdb.rs:69-99)
  • Partition pruning supported filter shapes (crates/runtime-table-partition/src/provider/pruning.rs:228-600)
  • DuckDB single-expression restriction (crates/runtime/src/dataaccelerator/partitioned_duckdb.rs:157-160)
  • Schema-evolution rejection error (crates/runtime-table-partition/src/creator.rs:49-52)
  • Real spicepod example placement (test/spicepods/tpch/sf1/accelerated/s3[parquet]-duckdb[file]-partitioned.yaml)

Test plan

  • Verified parameter names, defaults, and accepted types against Rust source
  • Pruning matrix entries match pruning.rs evaluation rules
  • Cross-links resolve
  • Markdown renders correctly

…ules

Rewrite the partitioning page with full coverage of supported
partition transforms (bucket, truncate, date_part, date_trunc,
modulo), composite partitioning, pruning matrix per filter shape,
engine-specific behavior (Arrow / DuckDB files-mode / DuckDB tables-mode
/ Cayenne), validation rules, and worked examples.

Verified against the Rust source: bucket return type matches the
num_buckets literal (Int8..Int64 / UInt8..UInt64) — not always Int64
as previously documented in scalar_functions.md. Truncate accepts
Utf8 and Binary in addition to numeric types. date_part filter
pruning is not yet implemented; equality on the partition expression
still prunes correctly.

Also:
- Cross-link from data-acceleration index to the new partitioning page
- Distinguish deployment-level sharding from acceleration partitioning
  in the sharded architecture page
- Correct bucket return type and truncate input type list in scalar_functions.md
@claudespice claudespice self-assigned this May 5, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

✅ Pull with Spice Passed

Passing checks:

  • ✅ Title meets minimum length requirement (10 characters)
  • ✅ Has at least one of the required labels: area/blog, area/docs, area/cookbook, dependencies
  • ✅ No banned labels detected
  • ✅ Has at least one assignee: claudespice

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

🔍 Pull with Spice Failed

Passing checks:

  • ✅ Title meets minimum length requirement (10 characters)
  • ✅ Has at least one of the required labels: area/blog, area/docs, area/cookbook, dependencies
  • ✅ No banned labels detected

Failed checks:

  • ❌ At least one assignee is required for this pull request.

Please address these issues and update your pull request.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

🚀 deployed to https://b853f82c.spiceai-org-website.pages.dev

@lukekim lukekim merged commit 18ad528 into trunk May 5, 2026
6 of 8 checks passed
@lukekim lukekim deleted the docs/comprehensive-partitioning branch May 5, 2026 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants