-
Notifications
You must be signed in to change notification settings - Fork 3.8k
GH-46710: [C++] Fix ownership and lifetime issues in Dataset Writer #46711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@github-actions crossbow submit -g cpp |
This comment was marked as outdated.
This comment was marked as outdated.
@zanmato1984 Would you have time to look at this? I fear that not many devs are able to review this code... |
Also cc @bkietz |
Sorry for being late. I'm actually reviewing this today. Didn't touch much of this part of the code so it may take some time. |
Thank you! I'll rebase to get a newer baseline for CI. |
44c6f48
to
0117ee4
Compare
@github-actions crossbow submit -g cpp |
Revision: 0117ee4 Submitted crossbow builds: ursacomputing/crossbow @ actions-3309186726 |
auto st = PopAndDeliverStagedBatch().status(); | ||
if (!st.ok()) { | ||
file_tasks_.reset(); | ||
return st; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto st = PopAndDeliverStagedBatch().status(); | |
if (!st.ok()) { | |
file_tasks_.reset(); | |
return st; | |
} | |
RETURN_NOT_OK_ELSE(PopAndDeliverStagedBatch(), file_tasks_.reset()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had forgotten about this little-used macro. I honestly find it a bit confusing, I had to look up the definition to ensure I understood the semantics correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the name is confusing. I do believe when given a less confusing name, this can be an actually useful macro. How about we rename it to something like RETURN_NOT_OK_AFTER
(in follow-up PR)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I'm not sure RETURN_NOT_OK_AFTER
would be much less confusing, honestly.
I wonder if we can add some kind of functional API to Status
, for example:
RETURN_NOT_OK(PopAndDeliverStagedBatch().OrElse([&] { file_tasks_.reset() }));
@@ -377,8 +375,8 @@ class ThrottledAsyncTaskSchedulerImpl | |||
if (maybe_backoff) { | |||
lk.unlock(); | |||
if (!maybe_backoff->TryAddCallback([&] { | |||
return [self = shared_from_this()](const Status& st) { | |||
if (st.ok()) { | |||
return [weak_self = weak_from_this()](const Status& st) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
pause_callback_(std::move(pause_callback)), | ||
resume_callback_(std::move(resume_callback)) {} | ||
|
||
~DatasetWriterImpl() { | ||
// In case something went wrong (e.g. an IO error occurred), some tasks | ||
// may be left dangling in a ThrottledAsyncTaskScheduler and that may |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are such tasks left dangling exactly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I haven't tried to analyze this very deeply. It might be the file_finish_task
actually? In any case, this is necessary to suppress the memory leaks in error cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though we are not sure about how exactly this fix addresses the issue, I think it's generally safe and improves the life-cycle safety. +1.
Thanks for the review @zanmato1984 ! |
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 57439e0. There were 68 benchmark results with an error:
There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 18 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…iter (apache#46711) ### Rationale for this change The dataset writer currently uses raw pointers of the pieces of state it needs to track, even though some of this state is accessed in async callbacks that might run at arbitrary points in time (especially in the case an error forces an early return of the synchronously running logic). ### What changes are included in this PR? This PR strives to strengthen the dataset writer code by using safe pointers everywhere possible, and also trying to guard against potential leaks due to cyclic shared_ptr references. Besides making it potentially more robust, it should also make future maintenance and evolutions easier. This PR seems to fix apache#45235, though it will have to be confirmed after multiple CI runs. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: apache#46710 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
…46792) ### Rationale for this change In #46711 (comment) it was mentioned that the macro `RETURN_NOT_OK_ELSE` is confusing and can easily be misunderstood. We would like a better way to conditionally chain error-handling code if a Status does not indicate success. ### What changes are included in this PR? 1. Add a type trait `IntoStatus<T>` that can be implemented to provide conversions from other error-like types 2. Add a global `ToStatus` function that calls the aforementioned type trait 3. Add a `Status::OrElse` method that calls a functor on error 4. Remove the `RETURN_NOT_OK_ELSE` macro ### Are these changes tested? Yes. ### Are there any user-facing changes? No, the `RETURN_NOT_OK_ELSE` was not supposed to be called by third-party code as it's not prefixed with `ARROW_`. * GitHub Issue: #46791 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <pitrou@free.fr>
Rationale for this change
The dataset writer currently uses raw pointers of the pieces of state it needs to track, even though some of this state is accessed in async callbacks that might run at arbitrary points in time (especially in the case an error forces an early return of the synchronously running logic).
What changes are included in this PR?
This PR strives to strengthen the dataset writer code by using safe pointers everywhere possible, and also trying to guard against potential leaks due to cyclic shared_ptr references.
Besides making it potentially more robust, it should also make future maintenance and evolutions easier.
This PR seems to fix #45235, though it will have to be confirmed after multiple CI runs.
Are these changes tested?
Yes.
Are there any user-facing changes?
No.