Skip to content

[pull] trunk from spiceai:trunk#857

Merged
pull[bot] merged 3 commits into
TheRakeshPurohit:trunkfrom
spiceai:trunk
May 22, 2026
Merged

[pull] trunk from spiceai:trunk#857
pull[bot] merged 3 commits into
TheRakeshPurohit:trunkfrom
spiceai:trunk

Conversation

@pull
Copy link
Copy Markdown

@pull pull Bot commented May 22, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

phillipleblanc and others added 3 commits May 22, 2026 05:20
Disentangles two unrelated semantics that were both named `WriteThrough`:

- The user-facing spicepod `write_mode: write_through` contract:
  "source-sync, accelerator catches up through the refresh mechanism."
- The internal "dual atomic write to source AND accelerator via Cayenne
  staged-append" added in PR #10115 for the Iceberg federated catalog
  cache use case (no CDC available to fill the cache).

PR #10446 reused the internal staged-append path for regular accelerated
datasets when `write_mode: write_through` was selected. That broke
Cayenne+CDC datasets with a primary key, because Cayenne's
`begin_staged_append` rejects non-`PositionBased` deletion strategies.
Writes succeeded on both sides (the federated insert + CDC replay) but
the staged task aborted, and the channel-close sentinel masked the real
error so the client saw `Internal Error: Unexpected internal error`.

Internal renames (spicepod / user-facing names unchanged):

- `WriteMode::FederatedOnly` → `WriteMode::WriteThrough`
  (now matches the user-facing contract).
- `WriteMode::WriteThrough { .. }` (dual atomic write) → `WriteMode::DualWrite { .. }`.
- `Builder::write_through()` → `Builder::dual_write()`. Only the Iceberg
  DDL handler at `iceberg_ddl/physical_plans.rs` calls it; the regular
  accelerated-dataset path no longer does.
- `AcceleratedTable::is_write_through()` → `is_dual_write()` and the
  distributed-insert planner check renamed to match.
- File: `accelerated_table/write/write_through.rs` → `dual_write.rs`;
  helper functions and sinks renamed accordingly.

Behavior change: spicepod `write_mode: write_through` now always routes
through `WriteMode::WriteThrough` (source insert) regardless of engine or
CDC setting. The accelerator catches up through the refresh mechanism —
WAL for `refresh_mode: changes`, periodic refresh otherwise — which is
what the user-facing docs always promised.

Also separates real upstream errors from channel-close symptoms in the
dual-write loop so a downstream task aborting (e.g. staged-append
rejecting a config) propagates its actual error instead of being masked.

Drive-by: adds missing backticks around `SQLite` in
`crates/cayenne/src/metastore/sqlite.rs` to unblock `make lint-rust`
(pre-existing trunk failure introduced by #10943).

Tests:
- 22 cayenne staged_append_test cases pass.
- 20 runtime on_conflict integration tests pass.
- 6 renamed dual_write unit tests pass.
- `cargo fmt --all --check` clean.

Closes #10960
Co-authored-by: Jeadie <jeadie@users.noreply.github.com>
…tafusion-ballista PRs #42 + #43) (#10919)

Bumps ballista-core / ballista-executor / ballista-scheduler to spiceai-52.5
tip (07be66a8), which carries:

  1. (#42) Wrap ObjectStoreShuffleStorage in object_store::prefix::PrefixStore
     so the URL path is reattached to every key — without this, writers
     uploaded to s3://bucket/<job>/... while readers looked under
     s3://bucket/<prefix>/<job>/... and got NotFound on every reduce stage.

  2. (#43) Dispatch s3:// partition paths inside BallistaClient::fetch_partition
     to the existing object-store reader. Before this the gRPC FetchPartition
     handler called tokio::fs::File::open("s3://...") and failed every
     single-batch query (q1) and reduce-stage fetch (q2+).

  3. (#43) Replace per-batch serialize_batch_to_ipc_bytes with a long-lived
     StreamingMultipartIpcUploader: one StreamWriter per output partition
     means the IPC stream has one header and one EOS marker instead of one
     stream per batch concatenated together. Fixes the
     ArrowError(IpcError("Unexpected EOS")) we saw on multi-batch hash-
     repartition queries.
@pull pull Bot locked and limited conversation to collaborators May 22, 2026
@pull pull Bot added the ⤵️ pull label May 22, 2026
@pull pull Bot merged commit cf8f12a into TheRakeshPurohit:trunk May 22, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants