Skip to content

feat: Implement object store shuffle write support for S3 and Azure#18

Merged
lukekim merged 4 commits into
spiceai-51from
lukim/object-store-shuffle
Feb 6, 2026
Merged

feat: Implement object store shuffle write support for S3 and Azure#18
lukekim merged 4 commits into
spiceai-51from
lukim/object-store-shuffle

Conversation

@lukekim
Copy link
Copy Markdown

@lukekim lukekim commented Feb 5, 2026

This pull request adds support for writing shuffle data directly to object storage backends (S3 and Azure) in Ballista, alongside improvements to shuffle storage configuration and comprehensive unit tests. The changes introduce a new execution path for shuffle writes, allowing for more flexible and scalable storage options, and provide robust parsing and validation for shuffle storage URLs.

Object store shuffle write support:

  • Added a new execution path in ShuffleWriterExec to write shuffle data directly to S3 or Azure object stores, using Arrow IPC format with LZ4 compression. Data is serialized in-memory and uploaded in a single operation. [1] [2] [3]
  • Integrated detection of object store shuffle configuration via session config, selecting the appropriate storage backend and URL. [1] [2]

Shuffle storage configuration enhancements:

  • Added the ShuffleStorageConfig::from_type_and_url method to robustly parse storage URLs for local, S3, and Azure backends, extracting backend-specific parameters and handling errors gracefully.

Testing and validation:

  • Added comprehensive unit tests for from_type_and_url covering local, S3 (with and without prefix), Azure (with and without prefix), and invalid URL scenarios to ensure correctness and error handling.

@lukekim lukekim self-assigned this Feb 5, 2026
@lukekim lukekim added the bug Something isn't working label Feb 5, 2026
Comment thread ballista/core/src/execution_plans/shuffle_writer.rs Outdated
Comment thread ballista/core/src/execution_plans/shuffle_writer.rs Outdated
@lukekim lukekim merged commit 56225b2 into spiceai-51 Feb 6, 2026
29 checks passed
@lukekim lukekim deleted the lukim/object-store-shuffle branch February 6, 2026 03:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working needs-upstream

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants