-
Notifications
You must be signed in to change notification settings - Fork 308
Open
Labels
bugSomething isn't working right.Something isn't working right.
Description
Is this a duplicate?
- I confirmed there appear to be no duplicate issues for this bug and that I agree to the Code of Conduct
Type of Bug
Performance
Component
CUB
Describe the bug
Looking at
cccl/cub/cub/block/block_adjacent_difference.cuh
Lines 133 to 137 in 9b7333b
| struct _TempStorage | |
| { | |
| T first_items[BLOCK_THREADS]; | |
| T last_items[BLOCK_THREADS]; | |
| }; |
first_items and last_items, i.e. a single array halo_items or similar would suffice. The unnecessary amount of requested shared memory can in practice result in reduced occupancy and therefore worse performance.
How to Reproduce
Not applicable.
Expected behavior
BlockAdjacentDifference should only request as much shared memory as it actually needs.
Reproduction link
No response
Operating System
No response
nvidia-smi output
No response
NVCC version
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't working right.Something isn't working right.
Type
Projects
Status
In Review