Description
Bug description
I am debugging an exception raised here.
I have a preliminary finding that it is caused by the code here. In my case, the iteration always increases the offsets
which is a pointer pointing to the slot for each row, so it is increased for numDefLevels
times and this is immediate next slot after the buffer's boundary, triggering that out of bound buffer exception next time when checking the boundary guard.
Assuming we have 360 rows of the same content:
{
"msg": {
"a": {
"b": []
}
}
}
It turns out that the root cause is that in Arrow
, the offsets
and validity
are stored like:
┌──────────────────────────────┐
│ ListArray "b" │
│ │
│ Validity Offsets │
│ ┌─────────┐ ┌─────────────┐ │
│ │ 1 │ │ 0 │ │
│ │ 1 │ │ 0 │ │
│ │ 1 │ │ 0 │ │
│ │ ... │ │ ... │ │
│ │ 1 │ │ 0 │ │
│ └─────────┘ │ ... │ │
│ │ 0 │ │
│ └─────────────┘ │
│ (360 entries) (361 entries) │
└──────────────────────────────┘
And "b"'s definition and repetition levels table:
┌───────────┬────────────┐
│ repLevel │ defLevel │
├───────────┼────────────┤
│ 0 │ 3 │
│ 0 │ 3 │
│ 0 │ 3 │
│ ... │ ... │
│ 0 │ 3 │
│ 0 │ 3 │
│ 0 │ 3 │
└───────────┴────────────┘
(360 entries total)
So the offsets
(361) need one more slot than numRepDefs
(360), which makes it write past the memory guard. A quick fix is to simply increase the capacity by 1 to ensure it has enough buffer. #12845
System information
Velox System Info v0.0.2
Commit: ca59469a3289960dd77f2d4b159eaa3e4f323098
CMake Version: 3.28.3
System: Linux-6.1.112+
Arch: x86_64
C++ Compiler: /usr/bin/c++
C++ Compiler Version: 11.4.0
C Compiler: /usr/bin/cc
C Compiler Version: 11.4.0
CMake Prefix Path: /usr/local;/usr;/;/usr/local/lib/python3.9/dist-packages/cmake/data;/usr/local;/usr/X11R6;/usr/pkg;/opt
Relevant logs
Write past Buffer capacity() 1440 Split