-
Notifications
You must be signed in to change notification settings - Fork 458
Description
Search before asking
- I searched in the issues and found nothing similar.
Fluss version
main (development)
Please describe the bug 🐞
Minimal reproduce step
Write data containing array columns to a Fluss KV table where:
- The number of rows is less than 1024 (INITIAL_CAPACITY)
- The total number of array elements across all rows exceeds 1024
For example: 250 rows with arrays containing 10 integers each (total 2500 elements).
What do you expect to see?
Data should be written successfully without errors.
What did you see instead?
java.lang.IndexOutOfBoundsException: index: 248, length: 1 (expected: range(0, 248))
at org.apache.fluss.shaded.arrow.org.apache.arrow.memory.ArrowBuf.checkIndexD(ArrowBuf.java:319)
at org.apache.fluss.shaded.arrow.org.apache.arrow.memory.ArrowBuf.chk(ArrowBuf.java:306)
at org.apache.fluss.shaded.arrow.org.apache.arrow.memory.ArrowBuf.getByte(ArrowBuf.java:508)
at org.apache.fluss.shaded.arrow.org.apache.arrow.vector.BitVectorHelper.setBit(BitVectorHelper.java:82)
at org.apache.fluss.shaded.arrow.org.apache.arrow.vector.IntVector.set(IntVector.java:160)
at org.apache.fluss.row.arrow.writers.ArrowIntWriter.doWrite(ArrowIntWriter.java:38)
at org.apache.fluss.row.arrow.writers.ArrowFieldWriter.write(ArrowFieldWriter.java:59)
at org.apache.fluss.row.arrow.writers.ArrowArrayWriter.doWrite(ArrowArrayWriter.java:44)
...
Root Cause Analysis
In ArrowWriter.writeRow(), the handleSafe flag is determined by comparing recordsCount (number of rows) against INITIAL_CAPACITY (1024):
boolean handleSafe = recordsCount >= INITIAL_CAPACITY;When handleSafe = false, Arrow writers use vector.set() which doesn't auto-grow the underlying buffer. When handleSafe = true, they use vector.setSafe() which auto-grows as needed.
The bug is in ArrowArrayWriter: it passes the parent's handleSafe flag (based on row count) to the element writer. However, array element indices are based on the cumulative element count across all arrays, not the row count. This means:
250 rows → handleSafe = false (since 250 < 1024)
But with 10 elements per array, element indices can reach 2500
The element vector was only initialized for 1024 elements
Arrow throws IndexOutOfBoundsException when writing element 1024+
Solution
No response
Are you willing to submit a PR?
- I'm willing to submit a PR!