Skip to content

Conversation

@binary-signal
Copy link
Contributor

Purpose

Linked issue: close #2164

Fix IndexOutOfBoundsException when writing rows with array columns where the total number of array elements exceeds INITIAL_CAPACITY (1024) while the row count stays below it.

Brief change log

In ArrowWriter.writeRow(), the handleSafe flag is determined by comparing row count against INITIAL_CAPACITY:

boolean handleSafe = recordsCount >= INITIAL_CAPACITY;

When handleSafe = false, Arrow writers use vector.set() which doesn't auto-grow the buffer. The bug is in ArrowArrayWriter.doWrite() which passes the parent's handleSafe flag to the element writer. However, array element indices grow based on cumulative element count, not row count.

Example: 250 rows with 10-element arrays → row count (250) < 1024 so handleSafe = false, but total elements (2500) exceeds the vector's initial capacity, causing IndexOutOfBoundsException.

Fix:
Always use safe writes (handleSafe = true) for array element writers in ArrowArrayWriter.doWrite(), since element indices can exceed INITIAL_CAPACITY independently of row count.

// Before
elementWriter.write(fieldIndex, array, arrIndex, handleSafe);

// After
elementWriter.write(fieldIndex, array, arrIndex, true);

Tests

  • Added ArrowReaderWriterTest#testArrayWriterWithManyElements: writes 200 rows with 10-element arrays (2000 total elements), verifying serialization succeeds and data can be read back correctly.

API and Format

No API or storage format changes.

Documentation

No documentation changes needed. This is a bug fix.

Signed-off-by: binary-signal <[email protected]>
Copy link
Contributor

@rionmonster rionmonster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — looks pretty straightforward. Approved! 👍

Copy link
Contributor

@XuQianJin-Stars XuQianJin-Stars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

@vamossagar12 vamossagar12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

for (int arrIndex = 0; arrIndex < array.size(); arrIndex++) {
int fieldIndex = offset + arrIndex;
elementWriter.write(fieldIndex, array, arrIndex, handleSafe);
// Always use safe writes for array elements because the element index (offset +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should mention in the comments on the class that the handleSafe field is ignored when writing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] IndexOutOfBoundsException when writing rows with array columns to KV table

4 participants