docs: Comprehensive partitioning, partition transforms, and pruning rules#1652
Merged
Conversation
…ules Rewrite the partitioning page with full coverage of supported partition transforms (bucket, truncate, date_part, date_trunc, modulo), composite partitioning, pruning matrix per filter shape, engine-specific behavior (Arrow / DuckDB files-mode / DuckDB tables-mode / Cayenne), validation rules, and worked examples. Verified against the Rust source: bucket return type matches the num_buckets literal (Int8..Int64 / UInt8..UInt64) — not always Int64 as previously documented in scalar_functions.md. Truncate accepts Utf8 and Binary in addition to numeric types. date_part filter pruning is not yet implemented; equality on the partition expression still prunes correctly. Also: - Cross-link from data-acceleration index to the new partitioning page - Distinguish deployment-level sharding from acceleration partitioning in the sharded architecture page - Correct bucket return type and truncate input type list in scalar_functions.md
✅ Pull with Spice PassedPassing checks:
|
🔍 Pull with Spice FailedPassing checks:
Failed checks:
Please address these issues and update your pull request. |
lukekim
approved these changes
May 5, 2026
|
🚀 deployed to https://b853f82c.spiceai-org-website.pages.dev |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Rewrite the partitioning page with comprehensive coverage of how
partition_byactually works in the runtime. Adds documentation for partition transforms (bucket,truncate,date_part,date_trunc, modulo), composite partitioning, the partition pruning matrix per filter shape, engine-specific behavior, and validation rules. Also fixes two inaccuracies in thescalar_functionsreference forbucketandtruncate.Changes
website/docs/features/data-acceleration/partitioning.md(rewritten)Sections added:
partition_by(underacceleration:) andpartition_mode(underacceleration.params:); supported engines table (arrow,duckdbfiles/tables,cayenne)bucket(num_buckets, column),truncate(width, value),date_part(...),date_trunc(...), modulo, plain column reference; each with valid types and example YAMLIN/NOT IN, range filters,OR-chains,ANDacross composite partitions);bucket()inequality pruning bounded-range-only rule;date_part()filter pruning not-yet-implemented notePartitionByExpressionsChangedrejection and the manual re-partition workflowwebsite/docs/reference/sql/scalar_functions.mdbucketreturn type: it matches thenum_bucketsliteral type (Int8…Int64/UInt8…UInt64), not alwaysInt64. Document the1..=1_000_000valid range and fixed-seed determinism.truncateaccepted types: also acceptsUtf8(strings) andBinaryfor prefix truncation, in addition to integers and decimals. Add a string example.website/docs/features/data-acceleration/index.mdwebsite/src/partials/deployment/architectures/_sharded.mdxVerification
All claims verified against the spiceai/spiceai source:
bucketreturn type andMAX_NUM_BUCKETS(crates/runtime-datafusion-udfs/src/bucket.rs:121-133,:35)truncateaccepted types (crates/runtime-datafusion-udfs/src/truncate.rs:135-148)partition_byYAML deserialization syntax (crates/spicepod/src/partitioning.rs:38-90)partition_byandpartition_mode(crates/runtime/src/component/dataset/acceleration.rs:418-433,crates/runtime/src/dataaccelerator/partitioned_duckdb.rs:69-99)crates/runtime-table-partition/src/provider/pruning.rs:228-600)crates/runtime/src/dataaccelerator/partitioned_duckdb.rs:157-160)crates/runtime-table-partition/src/creator.rs:49-52)test/spicepods/tpch/sf1/accelerated/s3[parquet]-duckdb[file]-partitioned.yaml)Test plan
pruning.rsevaluation rules