-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-45371: [C++] Fix data race in SimpleRecordBatch::columns
#45372
GH-45371: [C++] Fix data race in SimpleRecordBatch::columns
#45372
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cpp/src/arrow/record_batch_test.cc
Outdated
auto schema = ::arrow::schema({field("f1", utf8())}); | ||
auto record_batch = RecordBatch::Make(schema, length, {array_data}); | ||
std::atomic_bool start_flag{false}; | ||
std::thread t([record_batch, &start_flag]() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should more than one thread be tested here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this should use several threads that would do the same thing concurrently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current test the race is between t
and the main thread. Only 2 threads are necessary to produce the data race.
cpp/src/arrow/record_batch_test.cc
Outdated
auto schema = ::arrow::schema({field("f1", utf8())}); | ||
auto record_batch = RecordBatch::Make(schema, length, {array_data}); | ||
std::atomic_bool start_flag{false}; | ||
std::thread t([record_batch, &start_flag]() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this should use several threads that would do the same thing concurrently.
87216a6
to
41e23d7
Compare
Thanks for taking the time to review my PR! Let me just clarify the intent behind my test case. It's a minimal example that produces a data race detected by TSAN. The data race only needs to occur between 2 threads, in this case I use the main thread and
Now, we simultaneously have T1 reading There isn't an assertion that will catch this case because it is impossible for Test output
Running the test again with the proposed fix shows no data race. The Not sure if it makes sense to have a test that only works under TSAN, but I don't think there is any way to surface the bug consistently without tooling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation @colin-r-schultz . I understand the better and I agree the fix looks fine. I have just posted a couple suggestions.
Co-authored-by: Antoine Pitrou <[email protected]>
@github-actions crossbow submit -g cpp |
Revision: 1f8e12e Submitted crossbow builds: ursacomputing/crossbow @ actions-719bfbc8cf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM!
@github-actions crossbow submit -g cpp |
Revision: ac82bd0 Submitted crossbow builds: ursacomputing/crossbow @ actions-75e9839ac4 |
Merged. Thanks a lot for this @colin-r-schultz ! |
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 27900a6. There were 8 benchmark results with an error:
There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
GH-45371
What changes are included in this PR?
Use
std::atomic_compare_exchange
to initializeboxed_columns_[i]
so they are correctly written only once. This means that a reference toboxed_columns_
is safe to read after each element has been initialized.Are these changes tested?
Yes, there is a test case
TestRecordBatch.ColumnsThreadSafety
which passes under TSAN.Are there any user-facing changes?
No
This PR contains a "Critical Fix".
Without this fix, concurrent calls to
SimpleRecordBatch::columns
could lead to an invalid memory access and crash.SimpleRecordBatch::columns
#45371