-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Print incrementally in sigsegv handler #2712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Print incrementally in sigsegv handler #2712
Conversation
src/core/reactor.cc
Outdated
@@ -4042,18 +4061,20 @@ static void sigsegv_action(siginfo_t *info, ucontext_t* uc) noexcept { | |||
// with addr2line and other tools which expect an | |||
// address without any added offset (from ASLR or | |||
// because it's a relocated shared object). | |||
print_safe("Segmentation fault: resolved ip: "); | |||
print_safe("Segmentation fault: resolved ip: 0x"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it clear these are hex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, but it doesn't fit the "Print incrementally in sigsegv handler" patch title
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Busted. Would you like me to split it out into a new commit or extend the title/commit description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate commit (PR?) would be better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f93e247
to
f367b4d
Compare
memcpy(_buf + _pos, str, len); | ||
_pos += len; | ||
if (_immediate) { | ||
print_safe(str, len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the key change for immediate output in the buffer class: all appends come though here and in immediate mode we just print the buffer immediately, then flush and reserve are just no-ops in immediate node.
func(decorate(ip - 1)); | ||
|
||
if (incremental) { | ||
for (size_t last_frame = 1; last_frame <= max_backtrace; ++last_frame) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice, in the cases we've seen this the crash happens fairly far down the stack, and the non-crashing frames are enough to diagnose or go a long way to understanding the problem (e.g., in the stall detector case the full stall detector backtrace appears).
} | ||
|
||
void append_backtrace_oneline() noexcept { | ||
backtrace([this] (frame f) noexcept { | ||
reserve(3 + sizeof(f.addr) * 2); | ||
append(" 0x"); | ||
append_hex(f.addr); | ||
}); | ||
}, _immediate); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking out loud. This transparently converts "immediate" to "incremental". Oh, well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, introduce an "immediate" mode to backtrace buffer which prints all appended strings immediately
Hopefully it won't intermix these lines with more logs (from other shards?) thus making the whole thing unreadable. On the other hand, SIGSEGV shouldn't be easy to debug, and having intermixed lines is still better than having nothing at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking out loud. This transparently converts "immediate" to "incremental". Oh, well
Yeah. I felt that these were the right names for the two concepts, as they are not the same and could in principle be set independently, but here in the only existing usage "immediate" directly implies "incremetal". I could imagine a future where incremental would be a separate option inside the backtrace buffer in case you wanted say "non-immediate, incremental" or whatever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully it won't intermix these lines with more logs (from other shards?) thus making the whole thing unreadable.
Yeah. On the one hand it's possible, on the other hand it will hopefully be rare as we do make all these calls back-to-back with no delay in between so the window to race with concurrent output on another shard is small so I expect tearing to be rare in practice, and in on the third hand:
On the other hand, SIGSEGV shouldn't be easy to debug, and having intermixed lines is still better than having nothing at all
Yup, exactly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, this is one reason to do this only on the segfault path, not on the other paths (abort handler, reactor stall notifier, etc).
If we crash due to segfault, we want to print the backtrace incrementally, frame-by-frame, rather than all at once. This is because the backtrace call may crash at some problematic frame, in which case we get no output at all: these are very hard to diagnose. Instead, introduce an "immediate" mode to backtrace buffer which prints all appended strings immediately, and add an incremental mode to the backtrace function which calls ::backtrace repeatedly, increasing the number of requested frames by 1 each time. If we crash at some frame N we will have printed all frames 1..N-1 already. Before this change the backtrace looks like this using the reproducer in scylladb#2697: ``` Backtrace: ``` After it looks like: ``` Backtrace: /home/tdowns/dev/seastar/build-clang-18/dev/libseastar.so+0x36ce6a /home/tdowns/dev/seastar/build-clang-18/dev/libseastar.so+0x32fbba /home/tdowns/dev/seastar/build-clang-18/dev/libseastar.so+0x342d0b /home/tdowns/dev/seastar/build-clang-18/dev/libseastar.so+0x35cf6c /lib/x86_64-linux-gnu/libc.so.6+0x4251f /lib/x86_64-linux-gnu/libgcc_s.so.1+0x17585 /lib/x86_64-linux-gnu/libc.so.6+0x133bb2 /home/tdowns/dev/seastar/build-clang-18/dev/libseastar.so+0x36d0d7 /home/tdowns/dev/seastar/build-clang-18/dev/libseastar.so+0x32fb84 /home/tdowns/dev/seastar/build-clang-18/dev/libseastar.so+0x32f0e7 /home/tdowns/dev/seastar/build-clang-18/dev/libseastar.so+0x330281 /lib/x86_64-linux-gnu/libc.so.6+0x4251f /lib/x86_64-linux-gnu/libgcc_s.so.1+0x14789 /lib/x86_64-linux-gnu/libgcc_s.so.1+0x16f54 /home/tdowns/dev/seastar/build-clang-18/dev/libseastar.so+0x2b98a8 /lib/x86_64-linux-gnu/libgcc_s.so.1+0x173b8 ``` Fixes scylladb#2710.
f367b4d
to
05d9f80
Compare
If we crash due to segfault, we want to print the backtrace incrementally, frame-by-frame, rather than all at once. This is because the backtrace call may crash at some problematic frame, in which case we get no output at all: these are very hard to diagnose.
Instead, introduce an "immediate" mode to backtrace buffer which prints all appended strings immediately, and add an incremental mode to the backtrace function which calls ::backtrace repeatedly, increasing the number of requested frames by 1 each time. If we crash at some frame N we will have printed all frames 1..N-1 already.
Before this change the backtrace looks like this using the reproducer in #2697:
After it looks like:
Fixes #2710.