Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-45371: [C++] Fix data race in SimpleRecordBatch::columns #45372

Merged

Conversation

colin-r-schultz
Copy link
Contributor

@colin-r-schultz colin-r-schultz commented Jan 28, 2025

Rationale for this change

GH-45371

What changes are included in this PR?

Use std::atomic_compare_exchange to initialize boxed_columns_[i] so they are correctly written only once. This means that a reference to boxed_columns_ is safe to read after each element has been initialized.

Are these changes tested?

Yes, there is a test case TestRecordBatch.ColumnsThreadSafety which passes under TSAN.

Are there any user-facing changes?

No

This PR contains a "Critical Fix".

Without this fix, concurrent calls to SimpleRecordBatch::columns could lead to an invalid memory access and crash.

Copy link

⚠️ GitHub issue #45371 has been automatically assigned in GitHub to PR creator.

Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, also cc @bkietz @pitrou

auto schema = ::arrow::schema({field("f1", utf8())});
auto record_batch = RecordBatch::Make(schema, length, {array_data});
std::atomic_bool start_flag{false};
std::thread t([record_batch, &start_flag]() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should more than one thread be tested here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should use several threads that would do the same thing concurrently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current test the race is between t and the main thread. Only 2 threads are necessary to produce the data race.

auto schema = ::arrow::schema({field("f1", utf8())});
auto record_batch = RecordBatch::Make(schema, length, {array_data});
std::atomic_bool start_flag{false};
std::thread t([record_batch, &start_flag]() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should use several threads that would do the same thing concurrently.

cpp/src/arrow/record_batch_test.cc Outdated Show resolved Hide resolved
cpp/src/arrow/record_batch_test.cc Outdated Show resolved Hide resolved
cpp/src/arrow/record_batch_test.cc Show resolved Hide resolved
@colin-r-schultz colin-r-schultz force-pushed the recordbatch-columns-thread-safety branch from 87216a6 to 41e23d7 Compare January 29, 2025 19:58
@colin-r-schultz
Copy link
Contributor Author

Thanks for taking the time to review my PR! Let me just clarify the intent behind my test case. It's a minimal example that produces a data race detected by TSAN.

The data race only needs to occur between 2 threads, in this case I use the main thread and thread t. There also need only be 1 column. What happens when the two threads call auto columns = record_batch->columns() is as follows:

  1. T1 calls atomic_load(&boxed_columns_[0]) and reads nullptr
  2. T2 calls atomic_load(&boxed_columns_[0]) and reads nullptr
  3. T1 calls MakeArray and then atomic_store(&boxed_columns_[0], result)

Now, we simultaneously have T1 reading boxed_columns_[0] in order to copy-construct the columns variable in the test while T2 calls atomic_store(&boxed_columns_[0], result)

There isn't an assertion that will catch this case because it is impossible for boxed_columns_[0] to be read as nullptr because it has certainly been initialized by T1. So instead we can really on TSAN to prove that the data race exists. The output of running this test on the main branch using the ninja-debug-tsan preset is below:

Test output
[ RUN      ] TestRecordBatch.ColumnsThreadSafety
==================
WARNING: ThreadSanitizer: data race (pid=21327)
  Write of size 8 at 0x7b0400000db0 by thread T1 (mutexes: write M811):
    #0 std::enable_if<std::__and_<std::__not_<std::__is_tuple_like<arrow::Array*> >, std::is_move_constructible<arrow::Array*>, std::is_move_assignable<arrow::Array*> >::value, void>::type std::swap<arrow::Array*>(arrow::Array*&, arrow::Array*&) /usr/include/c++/12/bits/move.h:205 (arrow-table-test+0x1ec713)
    #1 std::__shared_ptr<arrow::Array, (__gnu_cxx::_Lock_policy)2>::swap(std::__shared_ptr<arrow::Array, (__gnu_cxx::_Lock_policy)2>&) /usr/include/c++/12/bits/shared_ptr_base.h:1686 (arrow-table-test+0x1db302)
    #2 void std::atomic_store_explicit<arrow::Array>(std::shared_ptr<arrow::Array>*, std::shared_ptr<arrow::Array>, std::memory_order) /usr/include/c++/12/bits/shared_ptr_atomic.h:169 (libarrow.so.2000+0x19dddd0)
    #3 void std::atomic_store<arrow::Array>(std::shared_ptr<arrow::Array>*, std::shared_ptr<arrow::Array>) /usr/include/c++/12/bits/shared_ptr_atomic.h:175 (libarrow.so.2000+0x19d7f42)
    #4 arrow::SimpleRecordBatch::column(int) const /home/user/arrow/cpp/src/arrow/record_batch.cc:106 (libarrow.so.2000+0x19d3cbd)
    #5 arrow::SimpleRecordBatch::columns() const /home/user/arrow/cpp/src/arrow/record_batch.cc:97 (libarrow.so.2000+0x19d3b5a)
    #6 operator() /home/user/arrow/cpp/src/arrow/record_batch_test.cc:407 (arrow-table-test+0x18c951)
    #7 __invoke_impl<void, arrow::TestRecordBatch_ColumnsThreadSafety_Test::TestBody()::<lambda()> > /usr/include/c++/12/bits/invoke.h:61 (arrow-table-test+0x1bb876)
    #8 __invoke<arrow::TestRecordBatch_ColumnsThreadSafety_Test::TestBody()::<lambda()> > /usr/include/c++/12/bits/invoke.h:96 (arrow-table-test+0x1bb7ed)
    #9 _M_invoke<0> /usr/include/c++/12/bits/std_thread.h:279 (arrow-table-test+0x1bb74e)
    #10 operator() /usr/include/c++/12/bits/std_thread.h:286 (arrow-table-test+0x1bb6f4)
    #11 _M_run /usr/include/c++/12/bits/std_thread.h:231 (arrow-table-test+0x1bb6aa)
    #12 <null> <null> (libstdc++.so.6+0xdc252)

  Previous read of size 8 at 0x7b0400000db0 by main thread:
    #0 std::__shared_ptr<arrow::Array, (__gnu_cxx::_Lock_policy)2>::__shared_ptr(std::__shared_ptr<arrow::Array, (__gnu_cxx::_Lock_policy)2> const&) /usr/include/c++/12/bits/shared_ptr_base.h:1522 (arrow-table-test+0xf32b6)
    #1 std::shared_ptr<arrow::Array>::shared_ptr(std::shared_ptr<arrow::Array> const&) /usr/include/c++/12/bits/shared_ptr.h:204 (arrow-table-test+0xf332a)
    #2 void std::_Construct<std::shared_ptr<arrow::Array>, std::shared_ptr<arrow::Array> const&>(std::shared_ptr<arrow::Array>*, std::shared_ptr<arrow::Array> const&) /usr/include/c++/12/bits/stl_construct.h:119 (arrow-table-test+0x116c63)
    #3 std::shared_ptr<arrow::Array>* std::__do_uninit_copy<__gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, std::shared_ptr<arrow::Array>*>(__gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, __gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, std::shared_ptr<arrow::Array>*) /usr/include/c++/12/bits/stl_uninitialized.h:120 (arrow-table-test+0x1119cb)
    #4 std::shared_ptr<arrow::Array>* std::__uninitialized_copy<false>::__uninit_copy<__gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, std::shared_ptr<arrow::Array>*>(__gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, __gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, std::shared_ptr<arrow::Array>*) /usr/include/c++/12/bits/stl_uninitialized.h:137 (arrow-table-test+0x10896d)
    #5 std::shared_ptr<arrow::Array>* std::uninitialized_copy<__gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, std::shared_ptr<arrow::Array>*>(__gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, __gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, std::shared_ptr<arrow::Array>*) /usr/include/c++/12/bits/stl_uninitialized.h:185 (arrow-table-test+0x103b49)
    #6 std::shared_ptr<arrow::Array>* std::__uninitialized_copy_a<__gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, std::shared_ptr<arrow::Array>*, std::shared_ptr<arrow::Array> >(__gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, __gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, std::shared_ptr<arrow::Array>*, std::allocator<std::shared_ptr<arrow::Array> >&) /usr/include/c++/12/bits/stl_uninitialized.h:372 (arrow-table-test+0xfdd56)
    #7 std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > >::vector(std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > const&) /usr/include/c++/12/bits/stl_vector.h:601 (arrow-table-test+0xf726c)
    #8 arrow::TestRecordBatch_ColumnsThreadSafety_Test::TestBody() /home/user/arrow/cpp/src/arrow/record_batch_test.cc:413 (arrow-table-test+0x18cf6a)
    #9 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/user/arrow/cpp/out/build/ninja-debug-tsan/_deps/googletest-src/googletest/src/gtest.cc:2607 (libarrow_gtestd.so.1.11.0+0xd6672)

  Location is heap block of size 16 at 0x7b0400000db0 allocated by main thread:
    #0 operator new(unsigned long) ../../../../src/libsanitizer/tsan/tsan_new_delete.cpp:64 (libtsan.so.2+0x8d7d9)
    #1 std::__new_allocator<std::shared_ptr<arrow::Array> >::allocate(unsigned long, void const*) /usr/include/c++/12/bits/new_allocator.h:137 (arrow-table-test+0x110a98)
    #2 std::allocator_traits<std::allocator<std::shared_ptr<arrow::Array> > >::allocate(std::allocator<std::shared_ptr<arrow::Array> >&, unsigned long) /usr/include/c++/12/bits/alloc_traits.h:464 (arrow-table-test+0x1075ec)
    #3 std::_Vector_base<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > >::_M_allocate(unsigned long) /usr/include/c++/12/bits/stl_vector.h:378 (arrow-table-test+0x10130c)
    #4 std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > >::_M_default_append(unsigned long) /usr/include/c++/12/bits/vector.tcc:657 (libarrow.so.2000+0x19ddaeb)
    #5 std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > >::resize(unsigned long) /usr/include/c++/12/bits/stl_vector.h:1011 (libarrow.so.2000+0x19d7e0d)
    #6 arrow::SimpleRecordBatch::SimpleRecordBatch(std::shared_ptr<arrow::Schema> const&, long, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType, std::shared_ptr<arrow::Device::SyncEvent>) /home/user/arrow/cpp/src/arrow/record_batch.cc:91 (libarrow.so.2000+0x19d3a8e)
    #7 void std::_Construct<arrow::SimpleRecordBatch, std::shared_ptr<arrow::Schema>, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent> >(arrow::SimpleRecordBatch*, std::shared_ptr<arrow::Schema>&&, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >&&, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent>&&) /usr/include/c++/12/bits/stl_construct.h:119 (libarrow.so.2000+0x19f2c75)
    #8 void std::allocator_traits<std::allocator<void> >::construct<arrow::SimpleRecordBatch, std::shared_ptr<arrow::Schema>, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent> >(std::allocator<void>&, arrow::SimpleRecordBatch*, std::shared_ptr<arrow::Schema>&&, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >&&, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent>&&) /usr/include/c++/12/bits/alloc_traits.h:635 (libarrow.so.2000+0x19f0969)
    #9 std::_Sp_counted_ptr_inplace<arrow::SimpleRecordBatch, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<std::shared_ptr<arrow::Schema>, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent> >(std::allocator<void>, std::shared_ptr<arrow::Schema>&&, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >&&, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent>&&) /usr/include/c++/12/bits/shared_ptr_base.h:604 (libarrow.so.2000+0x19ed7ca)
    #10 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<arrow::SimpleRecordBatch, std::allocator<void>, std::shared_ptr<arrow::Schema>, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent> >(arrow::SimpleRecordBatch*&, std::_Sp_alloc_shared_tag<std::allocator<void> >, std::shared_ptr<arrow::Schema>&&, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >&&, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent>&&) <null> (libarrow.so.2000+0x19e9414)
    #11 std::__shared_ptr<arrow::SimpleRecordBatch, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<void>, std::shared_ptr<arrow::Schema>, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent> >(std::_Sp_alloc_shared_tag<std::allocator<void> >, std::shared_ptr<arrow::Schema>&&, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >&&, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent>&&) <null> (libarrow.so.2000+0x19e504c)
    #12 std::shared_ptr<arrow::SimpleRecordBatch>::shared_ptr<std::allocator<void>, std::shared_ptr<arrow::Schema>, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent> >(std::_Sp_alloc_shared_tag<std::allocator<void> >, std::shared_ptr<arrow::Schema>&&, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >&&, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent>&&) <null> (libarrow.so.2000+0x19dec89)
    #13 std::shared_ptr<std::enable_if<!std::is_array<arrow::SimpleRecordBatch>::value, arrow::SimpleRecordBatch>::type> std::make_shared<arrow::SimpleRecordBatch, std::shared_ptr<arrow::Schema>, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent> >(std::shared_ptr<arrow::Schema>&&, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >&&, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent>&&) /usr/include/c++/12/bits/shared_ptr.h:1010 (libarrow.so.2000+0x19d90e8)
    #14 arrow::RecordBatch::Make(std::shared_ptr<arrow::Schema>, long, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType, std::shared_ptr<arrow::Device::SyncEvent>) /home/user/arrow/cpp/src/arrow/record_batch.cc:230 (libarrow.so.2000+0x19c6223)
    #15 arrow::TestRecordBatch_ColumnsThreadSafety_Test::TestBody() /home/user/arrow/cpp/src/arrow/record_batch_test.cc:403 (arrow-table-test+0x18ce84)
    #16 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/user/arrow/cpp/out/build/ninja-debug-tsan/_deps/googletest-src/googletest/src/gtest.cc:2607 (libarrow_gtestd.so.1.11.0+0xd6672)

  Mutex M811 (0x7fffeebff080) created at:
    #0 pthread_mutex_lock ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:4324 (libtsan.so.2+0x59bbf)
    #1 std::_Sp_locker::_Sp_locker(void const*) <null> (libstdc++.so.6+0xdb89c)
    #2 std::shared_ptr<arrow::Array> std::atomic_load<arrow::Array>(std::shared_ptr<arrow::Array> const*) /usr/include/c++/12/bits/shared_ptr_atomic.h:138 (libarrow.so.2000+0x19d7ebe)
    #3 arrow::SimpleRecordBatch::column(int) const /home/user/arrow/cpp/src/arrow/record_batch.cc:103 (libarrow.so.2000+0x19d3c20)
    #4 arrow::RecordBatch::Equals(arrow::RecordBatch const&, bool, arrow::EqualOptions const&) const /home/user/arrow/cpp/src/arrow/record_batch.cc:320 (libarrow.so.2000+0x19c75a7)
    #5 arrow::TestRecordBatch_EqualOptions_Test::TestBody() /home/user/arrow/cpp/src/arrow/record_batch_test.cc:105 (arrow-table-test+0x17d808)
    #6 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/user/arrow/cpp/out/build/ninja-debug-tsan/_deps/googletest-src/googletest/src/gtest.cc:2607 (libarrow_gtestd.so.1.11.0+0xd6672)

  Thread T1 (tid=21339, running) created by main thread at:
    #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1001 (libtsan.so.2+0x63a59)
    #1 std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) <null> (libstdc++.so.6+0xdc328)
    #2 arrow::TestRecordBatch_ColumnsThreadSafety_Test::TestBody() /home/user/arrow/cpp/src/arrow/record_batch_test.cc:409 (arrow-table-test+0x18cf11)
    #3 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/user/arrow/cpp/out/build/ninja-debug-tsan/_deps/googletest-src/googletest/src/gtest.cc:2607 (libarrow_gtestd.so.1.11.0+0xd6672)

SUMMARY: ThreadSanitizer: data race /usr/include/c++/12/bits/move.h:205 in std::enable_if<std::__and_<std::__not_<std::__is_tuple_like<arrow::Array*> >, std::is_move_constructible<arrow::Array*>, std::is_move_assignable<arrow::Array*> >::value, void>::type std::swap<arrow::Array*>(arrow::Array*&, arrow::Array*&)
==================
==================
WARNING: ThreadSanitizer: data race (pid=21327)
  Write of size 8 at 0x7b0400000db8 by thread T1 (mutexes: write M811):
    #0 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::_M_swap(std::__shared_count<(__gnu_cxx::_Lock_policy)2>&) /usr/include/c++/12/bits/shared_ptr_base.h:1101 (arrow-table-test+0x10114d)
    #1 std::__shared_ptr<arrow::Array, (__gnu_cxx::_Lock_policy)2>::swap(std::__shared_ptr<arrow::Array, (__gnu_cxx::_Lock_policy)2>&) /usr/include/c++/12/bits/shared_ptr_base.h:1687 (arrow-table-test+0x1db31d)
    #2 void std::atomic_store_explicit<arrow::Array>(std::shared_ptr<arrow::Array>*, std::shared_ptr<arrow::Array>, std::memory_order) /usr/include/c++/12/bits/shared_ptr_atomic.h:169 (libarrow.so.2000+0x19dddd0)
    #3 void std::atomic_store<arrow::Array>(std::shared_ptr<arrow::Array>*, std::shared_ptr<arrow::Array>) /usr/include/c++/12/bits/shared_ptr_atomic.h:175 (libarrow.so.2000+0x19d7f42)
    #4 arrow::SimpleRecordBatch::column(int) const /home/user/arrow/cpp/src/arrow/record_batch.cc:106 (libarrow.so.2000+0x19d3cbd)
    #5 arrow::SimpleRecordBatch::columns() const /home/user/arrow/cpp/src/arrow/record_batch.cc:97 (libarrow.so.2000+0x19d3b5a)
    #6 operator() /home/user/arrow/cpp/src/arrow/record_batch_test.cc:407 (arrow-table-test+0x18c951)
    #7 __invoke_impl<void, arrow::TestRecordBatch_ColumnsThreadSafety_Test::TestBody()::<lambda()> > /usr/include/c++/12/bits/invoke.h:61 (arrow-table-test+0x1bb876)
    #8 __invoke<arrow::TestRecordBatch_ColumnsThreadSafety_Test::TestBody()::<lambda()> > /usr/include/c++/12/bits/invoke.h:96 (arrow-table-test+0x1bb7ed)
    #9 _M_invoke<0> /usr/include/c++/12/bits/std_thread.h:279 (arrow-table-test+0x1bb74e)
    #10 operator() /usr/include/c++/12/bits/std_thread.h:286 (arrow-table-test+0x1bb6f4)
    #11 _M_run /usr/include/c++/12/bits/std_thread.h:231 (arrow-table-test+0x1bb6aa)
    #12 <null> <null> (libstdc++.so.6+0xdc252)

  Previous read of size 8 at 0x7b0400000db8 by main thread:
    #0 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count(std::__shared_count<(__gnu_cxx::_Lock_policy)2> const&) /usr/include/c++/12/bits/shared_ptr_base.h:1075 (arrow-table-test+0xf4f9a)
    #1 std::__shared_ptr<arrow::Array, (__gnu_cxx::_Lock_policy)2>::__shared_ptr(std::__shared_ptr<arrow::Array, (__gnu_cxx::_Lock_policy)2> const&) /usr/include/c++/12/bits/shared_ptr_base.h:1522 (arrow-table-test+0xf32eb)
    #2 std::shared_ptr<arrow::Array>::shared_ptr(std::shared_ptr<arrow::Array> const&) /usr/include/c++/12/bits/shared_ptr.h:204 (arrow-table-test+0xf332a)
    #3 void std::_Construct<std::shared_ptr<arrow::Array>, std::shared_ptr<arrow::Array> const&>(std::shared_ptr<arrow::Array>*, std::shared_ptr<arrow::Array> const&) /usr/include/c++/12/bits/stl_construct.h:119 (arrow-table-test+0x116c63)
    #4 std::shared_ptr<arrow::Array>* std::__do_uninit_copy<__gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, std::shared_ptr<arrow::Array>*>(__gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, __gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, std::shared_ptr<arrow::Array>*) /usr/include/c++/12/bits/stl_uninitialized.h:120 (arrow-table-test+0x1119cb)
    #5 std::shared_ptr<arrow::Array>* std::__uninitialized_copy<false>::__uninit_copy<__gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, std::shared_ptr<arrow::Array>*>(__gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, __gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, std::shared_ptr<arrow::Array>*) /usr/include/c++/12/bits/stl_uninitialized.h:137 (arrow-table-test+0x10896d)
    #6 std::shared_ptr<arrow::Array>* std::uninitialized_copy<__gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, std::shared_ptr<arrow::Array>*>(__gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, __gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, std::shared_ptr<arrow::Array>*) /usr/include/c++/12/bits/stl_uninitialized.h:185 (arrow-table-test+0x103b49)
    #7 std::shared_ptr<arrow::Array>* std::__uninitialized_copy_a<__gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, std::shared_ptr<arrow::Array>*, std::shared_ptr<arrow::Array> >(__gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, __gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, std::shared_ptr<arrow::Array>*, std::allocator<std::shared_ptr<arrow::Array> >&) /usr/include/c++/12/bits/stl_uninitialized.h:372 (arrow-table-test+0xfdd56)
    #8 std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > >::vector(std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > const&) /usr/include/c++/12/bits/stl_vector.h:601 (arrow-table-test+0xf726c)
    #9 arrow::TestRecordBatch_ColumnsThreadSafety_Test::TestBody() /home/user/arrow/cpp/src/arrow/record_batch_test.cc:413 (arrow-table-test+0x18cf6a)
    #10 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/user/arrow/cpp/out/build/ninja-debug-tsan/_deps/googletest-src/googletest/src/gtest.cc:2607 (libarrow_gtestd.so.1.11.0+0xd6672)

  Location is heap block of size 16 at 0x7b0400000db0 allocated by main thread:
    #0 operator new(unsigned long) ../../../../src/libsanitizer/tsan/tsan_new_delete.cpp:64 (libtsan.so.2+0x8d7d9)
    #1 std::__new_allocator<std::shared_ptr<arrow::Array> >::allocate(unsigned long, void const*) /usr/include/c++/12/bits/new_allocator.h:137 (arrow-table-test+0x110a98)
    #2 std::allocator_traits<std::allocator<std::shared_ptr<arrow::Array> > >::allocate(std::allocator<std::shared_ptr<arrow::Array> >&, unsigned long) /usr/include/c++/12/bits/alloc_traits.h:464 (arrow-table-test+0x1075ec)
    #3 std::_Vector_base<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > >::_M_allocate(unsigned long) /usr/include/c++/12/bits/stl_vector.h:378 (arrow-table-test+0x10130c)
    #4 std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > >::_M_default_append(unsigned long) /usr/include/c++/12/bits/vector.tcc:657 (libarrow.so.2000+0x19ddaeb)
    #5 std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > >::resize(unsigned long) /usr/include/c++/12/bits/stl_vector.h:1011 (libarrow.so.2000+0x19d7e0d)
    #6 arrow::SimpleRecordBatch::SimpleRecordBatch(std::shared_ptr<arrow::Schema> const&, long, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType, std::shared_ptr<arrow::Device::SyncEvent>) /home/user/arrow/cpp/src/arrow/record_batch.cc:91 (libarrow.so.2000+0x19d3a8e)
    #7 void std::_Construct<arrow::SimpleRecordBatch, std::shared_ptr<arrow::Schema>, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent> >(arrow::SimpleRecordBatch*, std::shared_ptr<arrow::Schema>&&, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >&&, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent>&&) /usr/include/c++/12/bits/stl_construct.h:119 (libarrow.so.2000+0x19f2c75)
    #8 void std::allocator_traits<std::allocator<void> >::construct<arrow::SimpleRecordBatch, std::shared_ptr<arrow::Schema>, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent> >(std::allocator<void>&, arrow::SimpleRecordBatch*, std::shared_ptr<arrow::Schema>&&, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >&&, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent>&&) /usr/include/c++/12/bits/alloc_traits.h:635 (libarrow.so.2000+0x19f0969)
    #9 std::_Sp_counted_ptr_inplace<arrow::SimpleRecordBatch, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<std::shared_ptr<arrow::Schema>, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent> >(std::allocator<void>, std::shared_ptr<arrow::Schema>&&, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >&&, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent>&&) /usr/include/c++/12/bits/shared_ptr_base.h:604 (libarrow.so.2000+0x19ed7ca)
    #10 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<arrow::SimpleRecordBatch, std::allocator<void>, std::shared_ptr<arrow::Schema>, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent> >(arrow::SimpleRecordBatch*&, std::_Sp_alloc_shared_tag<std::allocator<void> >, std::shared_ptr<arrow::Schema>&&, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >&&, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent>&&) <null> (libarrow.so.2000+0x19e9414)
    #11 std::__shared_ptr<arrow::SimpleRecordBatch, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<void>, std::shared_ptr<arrow::Schema>, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent> >(std::_Sp_alloc_shared_tag<std::allocator<void> >, std::shared_ptr<arrow::Schema>&&, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >&&, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent>&&) <null> (libarrow.so.2000+0x19e504c)
    #12 std::shared_ptr<arrow::SimpleRecordBatch>::shared_ptr<std::allocator<void>, std::shared_ptr<arrow::Schema>, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent> >(std::_Sp_alloc_shared_tag<std::allocator<void> >, std::shared_ptr<arrow::Schema>&&, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >&&, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent>&&) <null> (libarrow.so.2000+0x19dec89)
    #13 std::shared_ptr<std::enable_if<!std::is_array<arrow::SimpleRecordBatch>::value, arrow::SimpleRecordBatch>::type> std::make_shared<arrow::SimpleRecordBatch, std::shared_ptr<arrow::Schema>, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent> >(std::shared_ptr<arrow::Schema>&&, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >&&, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent>&&) /usr/include/c++/12/bits/shared_ptr.h:1010 (libarrow.so.2000+0x19d90e8)
    #14 arrow::RecordBatch::Make(std::shared_ptr<arrow::Schema>, long, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType, std::shared_ptr<arrow::Device::SyncEvent>) /home/user/arrow/cpp/src/arrow/record_batch.cc:230 (libarrow.so.2000+0x19c6223)
    #15 arrow::TestRecordBatch_ColumnsThreadSafety_Test::TestBody() /home/user/arrow/cpp/src/arrow/record_batch_test.cc:403 (arrow-table-test+0x18ce84)
    #16 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/user/arrow/cpp/out/build/ninja-debug-tsan/_deps/googletest-src/googletest/src/gtest.cc:2607 (libarrow_gtestd.so.1.11.0+0xd6672)

  Mutex M811 (0x7fffeebff080) created at:
    #0 pthread_mutex_lock ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:4324 (libtsan.so.2+0x59bbf)
    #1 std::_Sp_locker::_Sp_locker(void const*) <null> (libstdc++.so.6+0xdb89c)
    #2 std::shared_ptr<arrow::Array> std::atomic_load<arrow::Array>(std::shared_ptr<arrow::Array> const*) /usr/include/c++/12/bits/shared_ptr_atomic.h:138 (libarrow.so.2000+0x19d7ebe)
    #3 arrow::SimpleRecordBatch::column(int) const /home/user/arrow/cpp/src/arrow/record_batch.cc:103 (libarrow.so.2000+0x19d3c20)
    #4 arrow::RecordBatch::Equals(arrow::RecordBatch const&, bool, arrow::EqualOptions const&) const /home/user/arrow/cpp/src/arrow/record_batch.cc:320 (libarrow.so.2000+0x19c75a7)
    #5 arrow::TestRecordBatch_EqualOptions_Test::TestBody() /home/user/arrow/cpp/src/arrow/record_batch_test.cc:105 (arrow-table-test+0x17d808)
    #6 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/user/arrow/cpp/out/build/ninja-debug-tsan/_deps/googletest-src/googletest/src/gtest.cc:2607 (libarrow_gtestd.so.1.11.0+0xd6672)

  Thread T1 (tid=21339, running) created by main thread at:
    #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1001 (libtsan.so.2+0x63a59)
    #1 std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) <null> (libstdc++.so.6+0xdc328)
    #2 arrow::TestRecordBatch_ColumnsThreadSafety_Test::TestBody() /home/user/arrow/cpp/src/arrow/record_batch_test.cc:409 (arrow-table-test+0x18cf11)
    #3 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/user/arrow/cpp/out/build/ninja-debug-tsan/_deps/googletest-src/googletest/src/gtest.cc:2607 (libarrow_gtestd.so.1.11.0+0xd6672)

SUMMARY: ThreadSanitizer: data race /usr/include/c++/12/bits/shared_ptr_base.h:1101 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::_M_swap(std::__shared_count<(__gnu_cxx::_Lock_policy)2>&)
==================
[       OK ] TestRecordBatch.ColumnsThreadSafety (295 ms)

Running the test again with the proposed fix shows no data race. The ASSERT_EQ(columns.size(), 1) is just there to make sure the columns variable isn't optimized out.

Not sure if it makes sense to have a test that only works under TSAN, but I don't think there is any way to surface the bug consistently without tooling.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Feb 3, 2025
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation @colin-r-schultz . I understand the better and I agree the fix looks fine. I have just posted a couple suggestions.

colin-r-schultz and others added 2 commits February 4, 2025 17:18
Co-authored-by: Antoine Pitrou <[email protected]>
@pitrou
Copy link
Member

pitrou commented Feb 5, 2025

@github-actions crossbow submit -g cpp

Copy link

github-actions bot commented Feb 5, 2025

Revision: 1f8e12e

Submitted crossbow builds: ursacomputing/crossbow @ actions-719bfbc8cf

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-cuda-cpp-ubuntu-20.04-cuda-11.2.2 GitHub Actions
test-cuda-cpp-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-fedora-39-cpp GitHub Actions
test-ubuntu-20.04-cpp GitHub Actions
test-ubuntu-20.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-bundled-offline GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions

Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM!

cpp/src/arrow/record_batch_test.cc Outdated Show resolved Hide resolved
@pitrou
Copy link
Member

pitrou commented Feb 5, 2025

@github-actions crossbow submit -g cpp

Copy link

github-actions bot commented Feb 5, 2025

Revision: ac82bd0

Submitted crossbow builds: ursacomputing/crossbow @ actions-75e9839ac4

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-cuda-cpp-ubuntu-20.04-cuda-11.2.2 GitHub Actions
test-cuda-cpp-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-fedora-39-cpp GitHub Actions
test-ubuntu-20.04-cpp GitHub Actions
test-ubuntu-20.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-bundled-offline GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions

@pitrou pitrou merged commit 27900a6 into apache:main Feb 5, 2025
36 checks passed
@pitrou pitrou removed the awaiting committer review Awaiting committer review label Feb 5, 2025
@pitrou
Copy link
Member

pitrou commented Feb 5, 2025

Merged. Thanks a lot for this @colin-r-schultz !

@colin-r-schultz colin-r-schultz deleted the recordbatch-columns-thread-safety branch February 5, 2025 18:09
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 27900a6.

There were 8 benchmark results with an error:

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants