Skip to content

feat: Add BallistaBuilder for custom object store configuration#11

Merged
lukekim merged 6 commits into
spiceai-51from
lukim/object-store-pass-in
Jan 27, 2026
Merged

feat: Add BallistaBuilder for custom object store configuration#11
lukekim merged 6 commits into
spiceai-51from
lukim/object-store-pass-in

Conversation

@lukekim
Copy link
Copy Markdown

@lukekim lukekim commented Jan 24, 2026

This pull request introduces a new builder API for Ballista client session contexts, allowing users to register custom object stores (such as S3, Azure, or GCS) with custom authentication, and updates the shuffle reader logic to leverage this capability. The changes also refactor how shuffle partition locations are categorized and fetched, ensuring that object store locations are handled through the DataFusion runtime environment's registered stores. Additional minor test and dependency updates are included.

Builder API and Client Usability Improvements:

  • Added a new BallistaBuilder in ballista/client/src/builder.rs, providing a fluent API for configuring Ballista sessions, including registering pre-created object stores with custom authentication. This makes it easier for users to integrate custom object store credentials into their Ballista workflows. ([ballista/client/src/builder.rsR1-R295](https://github.com/spiceai/datafusion-ballista/pull/11/files#diff-d733b3d603f4b7fca02caa66d086f807a0c470ac3b9aacd2a623634e805e3b44R1-R295))
  • Exposed the new builder in the client prelude and module exports for convenient access (ballista/client/src/lib.rs, ballista/client/src/prelude.rs). ([[1]](https://github.com/spiceai/datafusion-ballista/pull/11/files#diff-ef0880be68f50bc2cfae28682e4f3e058dfd7c8e90daa464d6c456ba4be83c0aR21-R22), [[2]](https://github.com/spiceai/datafusion-ballista/pull/11/files#diff-d4c80d23e96437aea2e38ed510161a2687e541d92ad9587fce45bd7545f4683dR25))
  • Added object_store as a workspace dependency in Cargo.toml to support custom object store integration. ([ballista/client/Cargo.tomlR37](https://github.com/spiceai/datafusion-ballista/pull/11/files#diff-006774bc4e477c188de3fa0e969fc3b33d0d9815cf581ad509524598e213d84eR37))

Shuffle Reader Enhancements:

  • Refactored the shuffle reader to split partition locations into four categories: memory, local disk, object store, and remote, enabling more precise handling of object store partitions. ([ballista/core/src/execution_plans/shuffle_reader.rsL402-R441](https://github.com/spiceai/datafusion-ballista/pull/11/files#diff-821d458bfa4b39a875e8889c8e93594de9ade9fe75ff4336198f51751527f73fL402-R441))
  • Implemented logic to fetch shuffle partitions directly from object stores using the runtime environment's registered object stores, ensuring that custom authentication and configuration are respected. ([[1]](https://github.com/spiceai/datafusion-ballista/pull/11/files#diff-821d458bfa4b39a875e8889c8e93594de9ade9fe75ff4336198f51751527f73fR525-R541), [[2]](https://github.com/spiceai/datafusion-ballista/pull/11/files#diff-821d458bfa4b39a875e8889c8e93594de9ade9fe75ff4336198f51751527f73fR866-R972))
  • Updated the shuffle reader to pass the runtime environment to the fetch logic, so that object store access uses the correct credentials and registry. ([[1]](https://github.com/spiceai/datafusion-ballista/pull/11/files#diff-821d458bfa4b39a875e8889c8e93594de9ade9fe75ff4336198f51751527f73fR215), [[2]](https://github.com/spiceai/datafusion-ballista/pull/11/files#diff-821d458bfa4b39a875e8889c8e93594de9ade9fe75ff4336198f51751527f73fR454-R467))

Dependency and Import Updates:

  • Added necessary imports from DataFusion and object_store to support the new builder and shuffle reader logic. ([ballista/core/src/execution_plans/shuffle_reader.rsR34-R36](https://github.com/spiceai/datafusion-ballista/pull/11/files#diff-821d458bfa4b39a875e8889c8e93594de9ade9fe75ff4336198f51751527f73fR34-R36))

Test Adjustments:

  • Updated tests to account for the new runtime environment parameter and conditional compilation flags. ([[1]](https://github.com/spiceai/datafusion-ballista/pull/11/files#diff-821d458bfa4b39a875e8889c8e93594de9ade9fe75ff4336198f51751527f73fR1494), [[2]](https://github.com/spiceai/datafusion-ballista/pull/11/files#diff-2b29b595f3d5d17e4f1352885f4b625620bb954d55420171fe14e19a0a5df7e3R867), [[3]](https://github.com/spiceai/datafusion-ballista/pull/11/files#diff-2b29b595f3d5d17e4f1352885f4b625620bb954d55420171fe14e19a0a5df7e3L878-L879), [[4]](https://github.com/spiceai/datafusion-ballista/pull/11/files#diff-2b29b595f3d5d17e4f1352885f4b625620bb954d55420171fe14e19a0a5df7e3L935-L936))

- Introduced a new `BallistaBuilder` struct to facilitate the creation of Ballista session contexts with pre-configured object stores.
- Updated dependencies in `Cargo.toml` to use specific versions for `datafusion` and `pyo3`.
- Refactored `build.rs` to remove unused modules and streamline the build process.
- Added an example demonstrating the usage of `BallistaBuilder` with a pre-created S3 object store.
- Implemented methods in `BallistaBuilder` for setting session configurations, registering object stores, and creating contexts for remote and standalone execution.
@lukekim lukekim self-assigned this Jan 26, 2026
@lukekim lukekim added enhancement New feature or request and removed python labels Jan 26, 2026
Comment thread ballista/client/src/builder.rs Outdated
Comment thread ballista/client/src/builder.rs Outdated
Comment thread ballista/core/src/execution_plans/shuffle_reader.rs Outdated
Comment thread ballista/core/src/execution_plans/shuffle_reader.rs Outdated
Comment thread ballista/core/src/execution_plans/shuffle_reader.rs Outdated
Comment thread ballista/core/src/execution_plans/shuffle_reader.rs Outdated
Comment thread ballista/core/src/execution_plans/shuffle_reader.rs Outdated
Comment thread ballista/core/src/execution_plans/shuffle_reader.rs Outdated
Comment thread ballista/core/src/execution_plans/shuffle_reader.rs Outdated
Comment thread python/build.rs
Comment thread ballista/core/src/execution_plans/shuffle_reader.rs Outdated
Comment thread ballista/core/src/execution_plans/shuffle_reader.rs Outdated
@lukekim lukekim merged commit 2abc199 into spiceai-51 Jan 27, 2026
30 checks passed
@lukekim lukekim deleted the lukim/object-store-pass-in branch January 27, 2026 02:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants