Skip to content

Buffer out of bound when packing def/rep levels info to list #12813

Open
@anlowee

Description

@anlowee

Bug description

I am debugging an exception raised here.

I have a preliminary finding that it is caused by the code here. In my case, the iteration always increases the offsets which is a pointer pointing to the slot for each row, so it is increased for numDefLevels times and this is immediate next slot after the buffer's boundary, triggering that out of bound buffer exception next time when checking the boundary guard.

Assuming we have 360 rows of the same content:

{
  "msg": {
    "a": {
      "b": []
    }
  }
}

It turns out that the root cause is that in Arrow, the offsets and validity are stored like:

┌──────────────────────────────┐
│         ListArray "b"        │
│                              │
│  Validity     Offsets        │
│ ┌─────────┐  ┌─────────────┐ │
│ │ 1       │  │ 0           │ │
│ │ 1       │  │ 0           │ │
│ │ 1       │  │ 0           │ │
│ │  ...    │  │ ...         │ │
│ │ 1       │  │ 0           │ │
│ └─────────┘  │ ...         │ │
│              │ 0           │ │
│              └─────────────┘ │
│ (360 entries)  (361 entries) │
└──────────────────────────────┘

And "b"'s definition and repetition levels table:

┌───────────┬────────────┐
│ repLevel  │ defLevel   │
├───────────┼────────────┤
│     0     │     3      │
│     0     │     3      │
│     0     │     3      │
│    ...    │    ...     │
│     0     │     3      │
│     0     │     3      │
│     0     │     3      │
└───────────┴────────────┘
       (360 entries total)

So the offsets (361) need one more slot than numRepDefs (360), which makes it write past the memory guard. A quick fix is to simply increase the capacity by 1 to ensure it has enough buffer. #12845

System information

Velox System Info v0.0.2
Commit: ca59469a3289960dd77f2d4b159eaa3e4f323098
CMake Version: 3.28.3
System: Linux-6.1.112+
Arch: x86_64
C++ Compiler: /usr/bin/c++
C++ Compiler Version: 11.4.0
C Compiler: /usr/bin/cc
C Compiler Version: 11.4.0
CMake Prefix Path: /usr/local;/usr;/;/usr/local/lib/python3.9/dist-packages/cmake/data;/usr/local;/usr/X11R6;/usr/pkg;/opt

Relevant logs

Write past Buffer capacity() 1440 Split

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriageNewly created issue that needs attention.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions