Skip to content

Conversation

@mhaseeb123
Copy link
Member

@mhaseeb123 mhaseeb123 commented Oct 31, 2025

Description

This PR fixes a possible OOB memory access in ORC, HYBRID_SCAN, and PARQUET kernels when trying to read an unaligned 32 or 64 bit value from the memory.

Checklist

  • Run benchmarks to see if there is any performance regression from the new cuda::std::memcpy based unaligned_load
  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@mhaseeb123 mhaseeb123 requested a review from a team as a code owner October 31, 2025 23:45
@mhaseeb123 mhaseeb123 requested review from bdice and lamarrr October 31, 2025 23:45
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Oct 31, 2025
@mhaseeb123 mhaseeb123 changed the title Fix OOB memory access in the new PQ reader's dict decoder Fix OOB memory access in ORC and the new PQ readers due to fixed-width unaligned load Oct 31, 2025
@mhaseeb123 mhaseeb123 added bug Something isn't working 3 - Ready for Review Ready for review by team cuIO cuIO issue non-breaking Non-breaking change labels Oct 31, 2025
@davidwendt
Copy link
Contributor

davidwendt commented Nov 3, 2025

This appears to fix the first error but it looks like there are still more memcheck errors

compute-sanitizer --tool memcheck gtests/HYBRID_SCAN_TEST --gtest_brief=1 --rmm_mode=cuda

Partial output:

======== COMPUTE-SANITIZER
========= Invalid __global__ read of size 4 bytes
=========     at cudf::io::parquet::detail::gpuStoreOutput(unsigned int *, const unsigned char *, unsigned int, unsigned int)+0x106c0 in page_data.cuh:74
=========     by thread (64,0,0) in block (19,0,0)
=========     Access at 0x7b586f5ba3cc is out of bounds
=========     and is inside the nearest allocation at 0x7b586f591e00 of size 165,326 bytes
=========         Device Frame: void cudf::io::parquet::detail::read_fixed_width_value_fast<unsigned int, cudf::io::parquet::detail::page_state_buffers_s<(int)256, (int)1, (int)1>>(cudf::io::parquet::detail::page_state_s *, T2 *, int, T1 *)+0x10570 in page_data.cuh:337
=========         Device Frame: void cudf::io::parquet::detail::<unnamed>::decode_fixed_width_values<(int)128, (bool)1, (cudf::io::parquet::detail::copy_mode)0, cudf::io::parquet::detail::page_state_buffers_s<(int)256, (int)1, (int)1>>(cudf::io::parquet::detail::page_state_s *, T4 *, int, int, int)+0x10560 in decode_fixed.cu:171
=========         Device Frame: auto void cudf::io::parquet::detail::<unnamed>::decode_page_data_generic<unsigned char, (int)128, (cudf::io::parquet::detail::decode_kernel_mask)8192>(cudf::io::parquet::detail::PageInfo *, cudf::device_span<const cudf::io::parquet::detail::ColumnChunkDesc, (unsigned long)18446744073709551615>, unsigned long, unsigned long, cudf::device_span<const bool, (unsigned long)18446744073709551615>, cudf::device_span<unsigned long, (unsigned long)18446744073709551615>, unsigned int *)::[lambda() (instance 1)]::operator ()<(cudf::io::parquet::detail::copy_mode)0>() const+0xfbc0 in decode_fixed.cu:1193
=========         Device Frame: void cudf::io::parquet::detail::<unnamed>::decode_page_data_generic<unsigned char, (int)128, (cudf::io::parquet::detail::decode_kernel_mask)8192>(cudf::io::parquet::detail::PageInfo *, cudf::device_span<const cudf::io::parquet::detail::ColumnChunkDesc, (unsigned long)18446744073709551615>, unsigned long, unsigned long, cudf::device_span<const bool, (unsigned long)18446744073709551615>, cudf::device_span<unsigned long, (unsigned long)18446744073709551615>, unsigned int *)+0xfbc0 in decode_fixed.cu:1199
=========     Saved host backtrace up to driver entry point at kernel launch time
=========         Host Frame: cuLaunchKernel [0x39d6c4] in libcuda.so.1
=========         Host Frame:  [0x141f8] in libcudart.so.12
=========         Host Frame: cudaLaunchKernel [0x7d09d] in libcudart.so.12
=========         Host Frame: cudf::io::parquet::detail::decode_page_data(cudf::detail::hostdevice_span<cudf::io::parquet::detail::PageInfo>, cudf::detail::hostdevice_span<cudf::io::parquet::detail::ColumnChunkDesc const>, unsigned long, unsigned long, int, cudf::io::parquet::detail::decode_kernel_mask, cudf::device_span<bool const, 18446744073709551615ul>, cudf::device_span<unsigned long, 18446744073709551615ul>, unsigned int*, rmm::cuda_stream_view) [0xfabda7] in libcudf.so
=========         Host Frame: cudf::io::parquet::detail::reader_impl::decode_page_data(cudf::io::parquet::detail::reader_impl::read_mode, unsigned long, unsigned long)::{lambda(cudf::io::parquet::detail::decode_kernel_mask)#1}::operator()(cudf::io::parquet::detail::decode_kernel_mask) const [0xf096b3] in libcudf.so
=========         Host Frame: cudf::io::parquet::detail::reader_impl::decode_page_data(cudf::io::parquet::detail::reader_impl::read_mode, unsigned long, unsigned long) [0xf1402f] in libcudf.so
=========         Host Frame: cudf::io::table_with_metadata cudf::io::parquet::experimental::detail::hybrid_scan_reader_impl::read_chunk_internal<cudf::column_view>(cudf::io::parquet::detail::reader_impl::read_mode, cudf::io::parquet::experimental::detail::hybrid_scan_reader_impl::read_columns_mode, cudf::column_view) [0xe85f42] in libcudf.so
=========         Host Frame: cudf::io::parquet::experimental::detail::hybrid_scan_reader_impl::materialize_payload_columns(cudf::host_span<std::vector<int, std::allocator<int> > const, 18446744073709551615ul>, std::vector<rmm::device_buffer, std::allocator<rmm::device_buffer> >&&, cudf::column_view const&, cudf::io::parquet::experimental::use_data_page_mask, cudf::io::parquet_reader_options const&, rmm::cuda_stream_view) [0xe86d16] in libcudf.so
=========         Host Frame: cudf::io::parquet::experimental::hybrid_scan_reader::materialize_payload_columns(cudf::host_span<int const, 18446744073709551615ul>, std::vector<rmm::device_buffer, std::allocator<rmm::device_buffer> >&&, cudf::column_view const&, cudf::io::parquet::experimental::use_data_page_mask, cudf::io::parquet_reader_options const&, rmm::cuda_stream_view) const [0xe67309] in libcudf.so
=========         Host Frame: hybrid_scan(std::vector<char, std::allocator<char> >&, cudf::ast::operation const&, int, std::optional<std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&, rmm::cuda_stream_view, rmm::detail::cccl_async_resource_ref<cuda::mr::__4::basic_resource_ref<(cuda::mr::__4::_AllocType)1, cuda::mr::__4::device_accessible> >, rmm::mr::aligned_resource_adaptor<rmm::mr::device_memory_resource>&) [0xdd76f] in HYBRID_SCAN_TEST
=========         Host Frame: void (anonymous namespace)::test_hybrid_scan<2, 20000>(std::vector<cudf::column_view, std::allocator<cudf::column_view> > const&) [0x140006] in HYBRID_SCAN_TEST
=========         Host Frame: HybridScanTest_MaterializeLists_Test::TestBody() [0x1419a3] in HYBRID_SCAN_TEST

@mhaseeb123 mhaseeb123 moved this to Burndown in libcudf Nov 3, 2025
@mhaseeb123
Copy link
Member Author

This appears to fix the first error but it looks like there are still more memcheck errors

Fixed in ea3bf21

@mhaseeb123 mhaseeb123 added 4 - Needs Review Waiting for reviewer to review or respond and removed 3 - Ready for Review Ready for review by team labels Nov 3, 2025
@mhaseeb123 mhaseeb123 changed the title Fix OOB memory access in ORC and the new PQ readers due to fixed-width unaligned load Fix OOB memory access in ORC, HYBRID_SCAN and PQ tests due to fixed-width unaligned load Nov 3, 2025
@mhaseeb123 mhaseeb123 changed the title Fix OOB memory access in ORC, HYBRID_SCAN and PQ tests due to fixed-width unaligned load Fix OOB memory access in Orc and Parquet stacks due to fixed-width unaligned load Nov 3, 2025
@mhaseeb123 mhaseeb123 added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 4 - Needs Review Waiting for reviewer to review or respond labels Nov 3, 2025
}
template <typename T>
inline __device__ T WarpReduceOr16(T acc)
template <cudf::size_type size, typename T>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modernized these into one

return __shfl_xor_sync(~0, var, delta);
}

inline __device__ void syncwarp() { __syncwarp(); }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed as not being used anywhere

@mhaseeb123 mhaseeb123 changed the title Fix OOB memory access in Orc and Parquet stacks due to fixed-width unaligned load Fix OOB memory access in Orc and Parquet stacks from fixed-width unaligned loads Nov 3, 2025
return pos;
}

inline __device__ double Int128ToDouble_rn(uint64_t lo, int64_t hi)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing unused util

uint32_t v = p32[0];
return (ofs) ? __funnelshift_r(v, p32[1], ofs * 8) : v;
template <cudf::size_type size, typename T>
inline __device__ T warp_reduce_pos(T pos, uint32_t t)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, modernized into one. Thanks cursor

@mhaseeb123 mhaseeb123 requested a review from vuule November 4, 2025 01:29
Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like an all-around improvement, if the performance is not negatively impacted.

@mhaseeb123 mhaseeb123 added the DO NOT MERGE Hold off on merging; see PR for details label Nov 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

5 - Ready to Merge Testing and reviews complete, ready to merge bug Something isn't working cuIO cuIO issue DO NOT MERGE Hold off on merging; see PR for details libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change

Projects

Status: Burndown

Development

Successfully merging this pull request may close these issues.

4 participants