Skip to content

Fixed integer overflow in make_cute_packed_stride batch stride computation#3307

Open
a123pal wants to merge 1 commit into
NVIDIA:mainfrom
a123pal:main
Open

Fixed integer overflow in make_cute_packed_stride batch stride computation#3307
a123pal wants to merge 1 commit into
NVIDIA:mainfrom
a123pal:main

Conversation

@a123pal
Copy link
Copy Markdown

@a123pal a123pal commented Jun 7, 2026

Problem: make_cute_packed_stride computes the batch stride as get<0>(shape_MKL) * get<1>(shape_MKL) where both operands are int32. For large matrix dimensions (e.g. 49152 × 65536 = 3,221,225,472) this overflows int32 before the cast to IntT occurs, producing a negative batch stride.

Fix: Cast each operand to IntT before multiplying so the multiplication occurs in 64-bit space. Applied to both affected overloads.

Fixes #3269

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Potential overflow issue in packed_stride.hpp

1 participant