Skip to content

Commit 67a5975

Browse files
authored
Fix overflow case and clean up some logic (#18734)
The calculation of the size in bytes for the CUDA Array Interface for pylibcudf Column objects produced from a column_view and an arbitrary owner previously ran the risk of overflow because the arithmetic was performed on int32 types but that is actually the maximum size in number of elements, not bytes. Since the CAI is a Python object, we can do the arithmetic with pure Python (infinite precision) integers to avoid this problem. In the process of fixing this bug, this PR also does some minor cleanup of the various cases handled in the size calculation. Resolves #18598 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Matthew Roeschke (https://github.com/mroeschke) - Matthew Murray (https://github.com/Matt711) URL: #18734
1 parent c1ca525 commit 67a5975

File tree

1 file changed

+9
-10
lines changed

1 file changed

+9
-10
lines changed

python/pylibcudf/pylibcudf/column.pyx

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -86,25 +86,24 @@ cdef class OwnerWithCAI:
8686
cdef create(column_view cv, object owner):
8787
obj = OwnerWithCAI()
8888
obj.owner = owner
89-
cdef int size
89+
# The default size of 0 will be applied for any type that stores data in the
90+
# children (such that the parent size is 0).
91+
size = 0
9092
cdef column_view offsets_column
9193
cdef unique_ptr[scalar] last_offset
9294
if cv.type().id() == type_id.EMPTY:
9395
size = cv.size()
9496
elif is_fixed_width(cv.type()):
95-
size = cv.size() * cpp_size_of(cv.type())
97+
# Cast to Python integers before multiplyling to avoid overflow.
98+
size = int(cv.size()) * int(cpp_size_of(cv.type()))
9699
elif cv.type().id() == type_id.STRING:
97-
# The size of the character array in the parent is the offsets size
98-
num_children = cv.num_children()
99-
size = 0
100-
# A strings column with no children is created for empty/all null
101-
if num_children:
100+
# A strings column with no children is created for empty/all null, in which
101+
# case the size remains 0. Otherwise, the size of the character array stored
102+
# in the parent is the last offset in the offsets child.
103+
if cv.num_children():
102104
offsets_column = cv.child(0)
103105
last_offset = get_element(offsets_column, offsets_column.size() - 1)
104106
size = (<numeric_scalar[size_type] *> last_offset.get()).value()
105-
else:
106-
# All other types store data in the children, so the parent size is 0
107-
size = 0
108107

109108
obj.cai = {
110109
"shape": (size,),

0 commit comments

Comments
 (0)