- 
                Notifications
    
You must be signed in to change notification settings  - Fork 981
 
Fix OOB memory access in Orc and Parquet stacks from fixed-width unaligned loads #20458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix OOB memory access in Orc and Parquet stacks from fixed-width unaligned loads #20458
Conversation
| 
           This appears to fix the first error but it looks like there are still more memcheck errors Partial output:  | 
    
          
 Fixed in ea3bf21  | 
    
| } | ||
| template <typename T> | ||
| inline __device__ T WarpReduceOr16(T acc) | ||
| template <cudf::size_type size, typename T> | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modernized these into one
| return __shfl_xor_sync(~0, var, delta); | ||
| } | ||
| 
               | 
          ||
| inline __device__ void syncwarp() { __syncwarp(); } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed as not being used anywhere
| return pos; | ||
| } | ||
| 
               | 
          ||
| inline __device__ double Int128ToDouble_rn(uint64_t lo, int64_t hi) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing unused util
| uint32_t v = p32[0]; | ||
| return (ofs) ? __funnelshift_r(v, p32[1], ofs * 8) : v; | ||
| template <cudf::size_type size, typename T> | ||
| inline __device__ T warp_reduce_pos(T pos, uint32_t t) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, modernized into one. Thanks cursor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like an all-around improvement, if the performance is not negatively impacted.
Description
This PR fixes a possible OOB memory access in ORC, HYBRID_SCAN, and PARQUET kernels when trying to read an unaligned 32 or 64 bit value from the memory.
Checklist
cuda::std::memcpybasedunaligned_load