Skip to content

stall_detector: no backtrace if exception #2714

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

travisdowns
Copy link
Contributor

@travisdowns travisdowns commented Apr 2, 2025

If a an exception is in progress when a reactor stall is detected, omit taking the backtrace, as this can crash as detailed in #2697.

Add a test which reproduces the issue, and which crashes (sometimes) before this fix and which runs cleanly afterwards.

Fixes #2697.

This is the same fix we have applied in our seastar branch for Redpanda, in order to fix a similar crash in our not-upstreamed CPU profiler which relies on the same mechanism as the stall notifier. We have not seen any issues with that approach and the CPU profiler, when enabled, will take backtraces at a much higher frequency than the stall notifier in general (every 100ms, by default).

Of course, the question is if std::uncaught_exceptions is itself "signal safe" and you won't find any guarantee about it in the C++ standard (which largely ignores the existence of signals). I did check the implementation of this method in libc++ and libstdc++ and it looks "OK" to me: they both read a single field from an exceptions globals area, which looks to be OK to me.

If a an exception is in progress when a reactor stall is detected,
omit taking the backtrace, as this can crash as detailed in scylladb#2697.

Add a test which reproduces the issue, and which crashes (sometimes)
before this fix and which runs cleanly afterwards.

Fixes scylladb#2697.
@xemul
Copy link
Contributor

xemul commented Apr 8, 2025

For the record. There was an attempt to still collect the backtrace and don't crash if it fails (#2420)

@travisdowns
Copy link
Contributor Author

For the record. There was an attempt to still collect the backtrace and don't crash if it fails (#2420)

Thanks for the pointer @xemul, I hadn't seen that issue. I had done an issue search I think but did not search PRs. So this PR at least provides a reproducer which may help in evaluating the other PR as well.

@travisdowns
Copy link
Contributor Author

I also considered the longjmp approach, but went with this as I think it is simpler and overall safer. The downside is missing stalls in exceptions: however, IME that is very uncommon. I can't recall any stall which was "always" in an exception (this would probably crash pretty frequently if so), and the ones that are occasionally in an exception are usually there very rarely and the non-exception reports point just fine to the underlying location. Exceptions are slow so we also tend to remove them from hot paths.

@avikivity
Copy link
Member

Maybe test @xemul fix with @travisdowns reproducer?

Stalls in exception storms will land in exceptions. Good code should be de-exceptionalize such areas, but that's only one once experienced.

Copy link
Contributor

@michoecho michoecho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely preferable to a crash, so I approve. But...

The downside is missing stalls in exceptions: however, IME that is very uncommon. I can't recall any stall which was "always" in an exception (this would probably crash pretty frequently if so), and the ones that are occasionally in an exception are usually there very rarely and the non-exception reports point just fine to the underlying location. Exceptions are slow so we also tend to remove them from hot paths.

Correct me if I'm wrong, but there's a second downside: you can switch out from a seastar::thread while you are in the middle of unwinding (the ability to have async destructors is probably the main reason Scylla sometimes uses seastar::thread instead of coroutines), and then std::uncaught_exceptions() > 0 will remain true for an indefinite time, kept incremented by the switched-out thread.

@michoecho
Copy link
Contributor

michoecho commented Apr 24, 2025

For posterity, I'll copy-paste my comment about the problem from a different thread long time ago (applies only to libstdc++, I didn't read libc++ at all):


I kept reading the sources of gcc until I understood the entire mechanism.

In short, _Unwind_RaiseException (and _Unwind_Resume) are mostly implemented as regular C functions, but with a bit of magic that causes all registers to be saved in the prologue (and restored in the epilogue) regardless of liveness, and a __builtin call which modifies the stack address (and the return address on it) before the return.

At runtime, the unwinder finds the frame with the handler (computing the set of callee-saved registers on the way), and overwrites the callee-saved registers of _Unwind_RaiseException with the callee-saved registers of the handler frame. This doesn't happen atomically — there is a window where some registers belong to the handler frame, and some still belong to the caller of _Unwind_RaiseException. In particular, there's a window where the return address is updated to the handler, but the register used to compute (by DWARF) the frame address in the handler still isn't updated. This is the window when calling a nested unwinder will break.

It is theoretically possible to write _Unwind_RaiseException in such a way that it stays unwindable after every instruction. It's not possible to update the set of registers atomically, but it's possible to add extra "manual" DWARF that will move the description of the location of all callee-saved registers atomically. But _Unwind_RaiseException would have to become an even less "regular" function for that. (Especially since the caller's stack pointer is generally assumed to be not saved, but relative to the callee's CFA, so modifying it's description might require modifying the callee's CFA, which would make for a lot of mess).

I wouldn't count on it getting "fixed".

In this case it seems to be called by a stall detector - let's disable it every time we get into the unwinder (before that, that is).
Should be easy enough.

The _Unwind_RaiseException is seastar function, it can raise a flag to suspend stall detector and segv printer

No, this actually won't work, because those functions don't return. You can set a flag when you start the unwinding, but you have no way to unset it when you end the unwinding.

cxxabi maintains a counter of in-flight exceptions. So you could use that to detect if you are in the middle of C++ exception handling (doesn't work for exceptions from other languages, but AFAIK that's not a concern for us). But:

  1. We can switch to a different seastar::thread in the middle of unwinding, so you would have to implement a bit of extra bookkeeping to know that the current seastar thread is in the middle of unwinding, not just some seastar thread, otherwise a single seastar thread waiting for something in a destructor could block the backtracing for all seastar threads indefinitely.

  2. If you disable the stall detector for the entire duration of unwinding, you lose the ability to detect stalls in the destructors called during unwinding, and that would be regrettable.

I think a better solution for this particular problem would be to just stop the backtrace() immediately if it enters _Unwind_RaiseException, _Unwind_Resume, _Unwind_Resume_or_Rethrow, or _Unwind_ForcedUnwind. This works under the assumption that _Unwind_Backtrace doesn't try to compute anything on the parent frame until we actually move to it, which AFAIU is true. (It does look up DWARF info about the parent frame, but that should be OK — the immediate caller of _Unwind_Backtrace is always legit in this scenario, it's its consistency with other registers that is a problem, but they shouldn't be used as long as we aren't substituting them into the DWARF expressions).

@xemul
Copy link
Contributor

xemul commented Apr 25, 2025

For the record. Checked #2420 "fix" against this PR's reproducer. Works

Reactor stalled for 3 ms on shard 0, in scheduling group main. Backtrace: 0x1100be 0xd0351 0xd066b 0xd0737 0xd0772 0xd07a2 0x1a04f 0x1b1bf 0x1fad8 0x201b0 0x55034 0x206a8 0x1bf01 0x401a60 0x4017aa 0x403b8f 0x405b39 0x152590
Reactor stalled for 6 ms on shard 0, in scheduling group main. Backtrace: 0x1100be 0xd0351 0xd066b 0xd0737 0xd0772 0xd07a2 0x1a04f 0x1ceff 0x1e9d3 0x20069 0x55034 0x206a8 0x1bf01 0x401a01 0x4017aa 0x403b8f 0x405b39 0x152590
Reactor stalled for 11 ms on shard 0, in scheduling group main. Backtrace: 0x1100be 0xd0351 0xd066b 0xd0737 0xd0772 0xd07a2 0x1a04f 0x1d126 0x1ebc8 0x20069 0x55034 0x206a8 0x1bf01 0x401a60 0x4017aa 0x403b8f 0x405b39 0x152590
Reactor stalled for 20 ms on shard 0, in scheduling group main. Backtrace: 0x1100be 0xd0351 0xd066b 0xd0737 0xd0772 0xd07a2 0x1a04f 0x2317e 0x1e5c9 0x1fa89 0x201b0 0x55034 0x206a8 0x1bf01 0x401a01 0x403b8f 0x405b39 0x152590
Reactor stalled for 37 ms on shard 0, in scheduling group main. Backtrace: 0x1100be 0xd0351 0xd066b 0xd0737 0xd0772 0xd07a2 0x1a04f 0x1e9d8 0x1fa89 0x201b0 0x55034 0x206a8 0x1bf01 0x403c6a 0x405b39 0x152590
Reactor stalled for 1 ms on shard 0, in scheduling group main. Backtrace: 0x1100be 0xd0351 0xd066b 0xd0737 0xd0772 0xd07a2 0x1a04f 0x1f66c 0x1fafe 0x201b0 0x55034 0x206a8 0x1bf01 0x401a40 0x4017aa 0x403b8f 0x405b39 0x152590
Reactor stalled for 4 ms on shard 0, in scheduling group main. Backtrace: ¯\_(ツ)_/¯
Reactor stalled for 9 ms on shard 0, in scheduling group main. Backtrace: 0x1100be 0xd0351 0xd066b 0xd0737 0xd0772 0xd07a2 0x1a04f 0x1d1cf 0x1ebc8 0x1fa89 0x201b0 0x55034 0x206a8 0x1bf01 0x401a80 0x403b8f 0x405b39 0x152590
Reactor stalled for 18 ms on shard 0, in scheduling group main. Backtrace: 0x1100be 0xd0351 0xd066b 0xd0737 0xd0772 0xd07a2 0x1a04f 0x147f 0x1e5fc 0x20069 0x55034 0x206a8 0x1bf01 0x4019a4 0x4017aa 0x403b8f 0x405b39 0x152590

@avikivity , I'm leaning towards this PR, rather than the longjump one from #2420
We're trying to reduce the amount of exceptions flying by, so chances to catch stall in exception are low, but this approach looks safer. WDYT?

@travisdowns
Copy link
Contributor Author

@michoecho wrote:

For posterity, I'll copy-paste my comment about the problem from a different thread long time ago (applies only to libstdc++, I didn't read libc++ at all):

Excellent analysis! Is the long-ago issue public? If so I would like to read it.

Correct me if I'm wrong, but there's a second downside: you can switch out from a seastar::thread while you are in the middle of unwinding (the ability to have async destructors is probably the main reason Scylla sometimes uses seastar::thread instead of coroutines), and then std::uncaught_exceptions() > 0 will remain true for an indefinite time, kept incremented by the switched-out thread.

I hadn't considered this, but based on the way you've described the mechanism, I guess that's right.

This sort of points out a more general problem with std::uncaught_exceptions: it's going to be wrong if we swtich out in destructors during unwind, so it's probably not a good idea to use it for anything which requires an accurate answer (since I am pretty sure everyone expects/wants that to return the number of exceptions in progress on the current fiber, not some other fiber/thread that happened to switch out).

That said, I don't think we use any async destructors in redpanda codebase (there is almost no seastar::thread use at all, actually).

@michoecho
Copy link
Contributor

michoecho commented May 3, 2025

@michoecho wrote:

For posterity, I'll copy-paste my comment about the problem from a different thread long time ago (applies only to libstdc++, I didn't read libc++ at all):

Excellent analysis! Is the long-ago issue public? If so I would like to read it.

No, unfortunately it is private.

That said, I don't think we use any async destructors in redpanda codebase (there is almost no seastar::thread use at all, actually).

Scylla uses seastar::thread quite a lot. Especially in code that was written before C++20 coroutines were a thing.

Anyway, to be clear, I'm not saying that any of that should hold up the PR. Fixing the crash should take priority over everything else.

@michoecho
Copy link
Contributor

@avikivity @xemul Ping. If there are no good comments against this change, consider merging.

@xemul xemul closed this in 2e5443d May 26, 2025
xemul pushed a commit to scylladb/scylla-seastar that referenced this pull request Jun 17, 2025
If a an exception is in progress when a reactor stall is detected,
omit taking the backtrace, as this can crash as detailed in #2697.

Add a test which reproduces the issue, and which crashes (sometimes)
before this fix and which runs cleanly afterwards.

Fixes #2697.

Closes scylladb/seastar#2714

(cherry picked from commit 2e5443dfaf253428a2c82475a60f7bac2ac831e0)
xemul pushed a commit to scylladb/scylla-seastar that referenced this pull request Jun 17, 2025
If a an exception is in progress when a reactor stall is detected,
omit taking the backtrace, as this can crash as detailed in #2697.

Add a test which reproduces the issue, and which crashes (sometimes)
before this fix and which runs cleanly afterwards.

Fixes #2697.

Closes scylladb/seastar#2714

(cherry picked from commit 2e5443dfaf253428a2c82475a60f7bac2ac831e0)
xemul pushed a commit to scylladb/scylla-seastar that referenced this pull request Jun 17, 2025
If a an exception is in progress when a reactor stall is detected,
omit taking the backtrace, as this can crash as detailed in #2697.

Add a test which reproduces the issue, and which crashes (sometimes)
before this fix and which runs cleanly afterwards.

Fixes #2697.

Closes scylladb/seastar#2714

(cherry picked from commit 2e5443dfaf253428a2c82475a60f7bac2ac831e0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

seastar could crash if a stall report is emitted at an inopportune time
4 participants