fix: lazily initialize BatchCoalescer in CoalescedShuffleReaderStream to avoid schema type mismatch by phillipleblanc · Pull Request #24 · spiceai/datafusion-ballista

phillipleblanc · 2026-03-11T13:36:33Z

Summary

The BatchCoalescer inside CoalescedShuffleReaderStream was eagerly initialized with the declared schema from the execution plan. However, the actual IPC shuffle data may have different Arrow types — for example, string columns declared as LargeUtf8 in the plan schema but written as Utf8 by the CSV reader. When InProgressPrimitiveArray<T>::copy_rows() tries to downcast the source array, the type mismatch causes a panic:

Internal("\"primitive array\"")

Root Cause

In distributed (Ballista) query execution, shuffle data is written as Arrow IPC by Stage N and read by Stage N+1 via ShuffleReaderExec. The CoalescedShuffleReaderStream wraps the IPC reader output and coalesces small batches. It eagerly creates a LimitedBatchCoalescer using input.schema() — the declared schema from the plan. But the actual Arrow arrays in the IPC files may use different concrete types than what the plan declares, causing the BatchCoalescer to create InProgressPrimitiveArray<T> with the wrong T.

Fix

Lazily initialize the BatchCoalescer from the first actual batch's schema instead of the declared schema. This is the same pattern already applied to RepartitionExec in spiceai/datafusion#135.

Copilot

Pull request overview

This PR fixes a panic in Ballista’s shuffle reader path by avoiding eager LimitedBatchCoalescer initialization with the execution plan’s declared schema when the actual Arrow IPC shuffle batches contain differing concrete Arrow types.

Changes:

Make CoalescedShuffleReaderStream lazily initialize its LimitedBatchCoalescer from the first non-empty incoming RecordBatch schema.
Store batch_size / limit on the stream to support lazy coalescer creation and adjust polling logic to handle Option<LimitedBatchCoalescer>.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

… to avoid schema type mismatch The BatchCoalescer inside CoalescedShuffleReaderStream was eagerly initialized with the declared schema from the execution plan. However, the actual IPC shuffle data may have different Arrow types (e.g., string columns declared as LargeUtf8 in the plan but written as Utf8 by the CSV reader). When InProgressPrimitiveArray<T>::copy_rows() tries to downcast the source array, the type mismatch causes a panic: Internal("primitive array"). This applies the same lazy initialization pattern used for RepartitionExec (spiceai/datafusion#135): defer BatchCoalescer creation until the first batch arrives and use the batch's actual schema.

…anic in distributed queries Updates datafusion-ballista from e1153d7b to ad88031f which includes spiceai/datafusion-ballista#24: lazily initialize BatchCoalescer in CoalescedShuffleReaderStream using the first batch's actual schema instead of the declared plan schema. Fixes distributed queries with GROUP BY on string columns panicking with Internal("primitive array") when the plan declares LargeUtf8 but the IPC shuffle data contains Utf8.

…anic in distributed queries (#9716) Updates datafusion-ballista from e1153d7b to ad88031f which includes spiceai/datafusion-ballista#24: lazily initialize BatchCoalescer in CoalescedShuffleReaderStream using the first batch's actual schema instead of the declared plan schema. Fixes distributed queries with GROUP BY on string columns panicking with Internal("primitive array") when the plan declares LargeUtf8 but the IPC shuffle data contains Utf8.

Copilot AI review requested due to automatic review settings March 11, 2026 13:36

Copilot started reviewing on behalf of phillipleblanc March 11, 2026 13:37 View session

phillipleblanc force-pushed the phillip/fix-shuffle-reader-coalescer-schema branch from cc432f2 to 542a755 Compare March 11, 2026 13:38

phillipleblanc self-assigned this Mar 11, 2026

Copilot AI reviewed Mar 11, 2026

View reviewed changes

Comment thread ballista/core/src/execution_plans/shuffle_reader.rs

Comment thread ballista/core/src/execution_plans/shuffle_reader.rs

phillipleblanc force-pushed the phillip/fix-shuffle-reader-coalescer-schema branch from 542a755 to fc5d46f Compare March 11, 2026 13:46

sgrebnov approved these changes Mar 11, 2026

View reviewed changes

phillipleblanc merged commit ad88031 into spiceai-52 Mar 11, 2026
29 checks passed

phillipleblanc deleted the phillip/fix-shuffle-reader-coalescer-schema branch March 11, 2026 14:43

phillipleblanc mentioned this pull request Mar 11, 2026

fix: bump datafusion-ballista to fix BatchCoalescer schema mismatch panic in distributed queries spiceai/spiceai#9716

Merged

milenkovicm mentioned this pull request Mar 16, 2026

Arrow schema issues for partition columns milenkovicm/ballista_delta#74

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: lazily initialize BatchCoalescer in CoalescedShuffleReaderStream to avoid schema type mismatch#24

fix: lazily initialize BatchCoalescer in CoalescedShuffleReaderStream to avoid schema type mismatch#24
phillipleblanc merged 1 commit into
spiceai-52from
phillip/fix-shuffle-reader-coalescer-schema

phillipleblanc commented Mar 11, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

phillipleblanc commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Fix

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

phillipleblanc commented Mar 11, 2026 •

edited

Loading