Fix OOB memory access in Orc and Parquet stacks from fixed-width unaligned loads #20458

mhaseeb123 · 2025-10-31T23:45:23Z

Description

This PR fixes a possible OOB memory access in ORC, HYBRID_SCAN, and PARQUET kernels when trying to read an unaligned 32 or 64 bit value from the memory.

Checklist

Run benchmarks to see if there is any performance regression from the new cuda::std::memcpy based unaligned_load
I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

davidwendt · 2025-11-03T15:01:06Z

This appears to fix the first error but it looks like there are still more memcheck errors

compute-sanitizer --tool memcheck gtests/HYBRID_SCAN_TEST --gtest_brief=1 --rmm_mode=cuda

Partial output:

======== COMPUTE-SANITIZER
========= Invalid __global__ read of size 4 bytes
=========     at cudf::io::parquet::detail::gpuStoreOutput(unsigned int *, const unsigned char *, unsigned int, unsigned int)+0x106c0 in page_data.cuh:74
=========     by thread (64,0,0) in block (19,0,0)
=========     Access at 0x7b586f5ba3cc is out of bounds
=========     and is inside the nearest allocation at 0x7b586f591e00 of size 165,326 bytes
=========         Device Frame: void cudf::io::parquet::detail::read_fixed_width_value_fast<unsigned int, cudf::io::parquet::detail::page_state_buffers_s<(int)256, (int)1, (int)1>>(cudf::io::parquet::detail::page_state_s *, T2 *, int, T1 *)+0x10570 in page_data.cuh:337
=========         Device Frame: void cudf::io::parquet::detail::<unnamed>::decode_fixed_width_values<(int)128, (bool)1, (cudf::io::parquet::detail::copy_mode)0, cudf::io::parquet::detail::page_state_buffers_s<(int)256, (int)1, (int)1>>(cudf::io::parquet::detail::page_state_s *, T4 *, int, int, int)+0x10560 in decode_fixed.cu:171
=========         Device Frame: auto void cudf::io::parquet::detail::<unnamed>::decode_page_data_generic<unsigned char, (int)128, (cudf::io::parquet::detail::decode_kernel_mask)8192>(cudf::io::parquet::detail::PageInfo *, cudf::device_span<const cudf::io::parquet::detail::ColumnChunkDesc, (unsigned long)18446744073709551615>, unsigned long, unsigned long, cudf::device_span<const bool, (unsigned long)18446744073709551615>, cudf::device_span<unsigned long, (unsigned long)18446744073709551615>, unsigned int *)::[lambda() (instance 1)]::operator ()<(cudf::io::parquet::detail::copy_mode)0>() const+0xfbc0 in decode_fixed.cu:1193
=========         Device Frame: void cudf::io::parquet::detail::<unnamed>::decode_page_data_generic<unsigned char, (int)128, (cudf::io::parquet::detail::decode_kernel_mask)8192>(cudf::io::parquet::detail::PageInfo *, cudf::device_span<const cudf::io::parquet::detail::ColumnChunkDesc, (unsigned long)18446744073709551615>, unsigned long, unsigned long, cudf::device_span<const bool, (unsigned long)18446744073709551615>, cudf::device_span<unsigned long, (unsigned long)18446744073709551615>, unsigned int *)+0xfbc0 in decode_fixed.cu:1199
=========     Saved host backtrace up to driver entry point at kernel launch time
=========         Host Frame: cuLaunchKernel [0x39d6c4] in libcuda.so.1
=========         Host Frame:  [0x141f8] in libcudart.so.12
=========         Host Frame: cudaLaunchKernel [0x7d09d] in libcudart.so.12
=========         Host Frame: cudf::io::parquet::detail::decode_page_data(cudf::detail::hostdevice_span<cudf::io::parquet::detail::PageInfo>, cudf::detail::hostdevice_span<cudf::io::parquet::detail::ColumnChunkDesc const>, unsigned long, unsigned long, int, cudf::io::parquet::detail::decode_kernel_mask, cudf::device_span<bool const, 18446744073709551615ul>, cudf::device_span<unsigned long, 18446744073709551615ul>, unsigned int*, rmm::cuda_stream_view) [0xfabda7] in libcudf.so
=========         Host Frame: cudf::io::parquet::detail::reader_impl::decode_page_data(cudf::io::parquet::detail::reader_impl::read_mode, unsigned long, unsigned long)::{lambda(cudf::io::parquet::detail::decode_kernel_mask)#1}::operator()(cudf::io::parquet::detail::decode_kernel_mask) const [0xf096b3] in libcudf.so
=========         Host Frame: cudf::io::parquet::detail::reader_impl::decode_page_data(cudf::io::parquet::detail::reader_impl::read_mode, unsigned long, unsigned long) [0xf1402f] in libcudf.so
=========         Host Frame: cudf::io::table_with_metadata cudf::io::parquet::experimental::detail::hybrid_scan_reader_impl::read_chunk_internal<cudf::column_view>(cudf::io::parquet::detail::reader_impl::read_mode, cudf::io::parquet::experimental::detail::hybrid_scan_reader_impl::read_columns_mode, cudf::column_view) [0xe85f42] in libcudf.so
=========         Host Frame: cudf::io::parquet::experimental::detail::hybrid_scan_reader_impl::materialize_payload_columns(cudf::host_span<std::vector<int, std::allocator<int> > const, 18446744073709551615ul>, std::vector<rmm::device_buffer, std::allocator<rmm::device_buffer> >&&, cudf::column_view const&, cudf::io::parquet::experimental::use_data_page_mask, cudf::io::parquet_reader_options const&, rmm::cuda_stream_view) [0xe86d16] in libcudf.so
=========         Host Frame: cudf::io::parquet::experimental::hybrid_scan_reader::materialize_payload_columns(cudf::host_span<int const, 18446744073709551615ul>, std::vector<rmm::device_buffer, std::allocator<rmm::device_buffer> >&&, cudf::column_view const&, cudf::io::parquet::experimental::use_data_page_mask, cudf::io::parquet_reader_options const&, rmm::cuda_stream_view) const [0xe67309] in libcudf.so
=========         Host Frame: hybrid_scan(std::vector<char, std::allocator<char> >&, cudf::ast::operation const&, int, std::optional<std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&, rmm::cuda_stream_view, rmm::detail::cccl_async_resource_ref<cuda::mr::__4::basic_resource_ref<(cuda::mr::__4::_AllocType)1, cuda::mr::__4::device_accessible> >, rmm::mr::aligned_resource_adaptor<rmm::mr::device_memory_resource>&) [0xdd76f] in HYBRID_SCAN_TEST
=========         Host Frame: void (anonymous namespace)::test_hybrid_scan<2, 20000>(std::vector<cudf::column_view, std::allocator<cudf::column_view> > const&) [0x140006] in HYBRID_SCAN_TEST
=========         Host Frame: HybridScanTest_MaterializeLists_Test::TestBody() [0x1419a3] in HYBRID_SCAN_TEST

mhaseeb123 · 2025-11-03T18:07:29Z

This appears to fix the first error but it looks like there are still more memcheck errors

Fixed in ea3bf21

mhaseeb123 · 2025-11-03T20:30:32Z