Skip to content

[Bug] IndexOutOfBoundsException when writing rows with array columns to KV table #2164

@binary-signal

Description

@binary-signal

Search before asking

  • I searched in the issues and found nothing similar.

Fluss version

main (development)

Please describe the bug 🐞

Minimal reproduce step

Write data containing array columns to a Fluss KV table where:

  1. The number of rows is less than 1024 (INITIAL_CAPACITY)
  2. The total number of array elements across all rows exceeds 1024

For example: 250 rows with arrays containing 10 integers each (total 2500 elements).

What do you expect to see?

Data should be written successfully without errors.

What did you see instead?

java.lang.IndexOutOfBoundsException: index: 248, length: 1 (expected: range(0, 248))
    at org.apache.fluss.shaded.arrow.org.apache.arrow.memory.ArrowBuf.checkIndexD(ArrowBuf.java:319)
    at org.apache.fluss.shaded.arrow.org.apache.arrow.memory.ArrowBuf.chk(ArrowBuf.java:306)
    at org.apache.fluss.shaded.arrow.org.apache.arrow.memory.ArrowBuf.getByte(ArrowBuf.java:508)
    at org.apache.fluss.shaded.arrow.org.apache.arrow.vector.BitVectorHelper.setBit(BitVectorHelper.java:82)
    at org.apache.fluss.shaded.arrow.org.apache.arrow.vector.IntVector.set(IntVector.java:160)
    at org.apache.fluss.row.arrow.writers.ArrowIntWriter.doWrite(ArrowIntWriter.java:38)
    at org.apache.fluss.row.arrow.writers.ArrowFieldWriter.write(ArrowFieldWriter.java:59)
    at org.apache.fluss.row.arrow.writers.ArrowArrayWriter.doWrite(ArrowArrayWriter.java:44)
    ...

Root Cause Analysis

In ArrowWriter.writeRow(), the handleSafe flag is determined by comparing recordsCount (number of rows) against INITIAL_CAPACITY (1024):

boolean handleSafe = recordsCount >= INITIAL_CAPACITY;

When handleSafe = false, Arrow writers use vector.set() which doesn't auto-grow the underlying buffer. When handleSafe = true, they use vector.setSafe() which auto-grows as needed.

The bug is in ArrowArrayWriter: it passes the parent's handleSafe flag (based on row count) to the element writer. However, array element indices are based on the cumulative element count across all arrays, not the row count. This means:

250 rows → handleSafe = false (since 250 < 1024)
But with 10 elements per array, element indices can reach 2500
The element vector was only initialized for 1024 elements
Arrow throws IndexOutOfBoundsException when writing element 1024+

Solution

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions