stall_detector: no backtrace if exception #2714
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
If a an exception is in progress when a reactor stall is detected, omit taking the backtrace, as this can crash as detailed in #2697.
Add a test which reproduces the issue, and which crashes (sometimes) before this fix and which runs cleanly afterwards.
Fixes #2697.
This is the same fix we have applied in our seastar branch for Redpanda, in order to fix a similar crash in our not-upstreamed CPU profiler which relies on the same mechanism as the stall notifier. We have not seen any issues with that approach and the CPU profiler, when enabled, will take backtraces at a much higher frequency than the stall notifier in general (every 100ms, by default).
Of course, the question is if std::uncaught_exceptions is itself "signal safe" and you won't find any guarantee about it in the C++ standard (which largely ignores the existence of signals). I did check the implementation of this method in libc++ and libstdc++ and it looks "OK" to me: they both read a single field from an exceptions globals area, which looks to be OK to me.