Skip to content

Print incrementally in sigsegv handler #2712

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

travisdowns
Copy link
Contributor

If we crash due to segfault, we want to print the backtrace incrementally, frame-by-frame, rather than all at once. This is because the backtrace call may crash at some problematic frame, in which case we get no output at all: these are very hard to diagnose.

Instead, introduce an "immediate" mode to backtrace buffer which prints all appended strings immediately, and add an incremental mode to the backtrace function which calls ::backtrace repeatedly, increasing the number of requested frames by 1 each time. If we crash at some frame N we will have printed all frames 1..N-1 already.

Before this change the backtrace looks like this using the reproducer in #2697:

Backtrace:

After it looks like:

Backtrace:
  /home/tdowns/dev/seastar/build-clang-18/dev/libseastar.so+0x36ce6a
  /home/tdowns/dev/seastar/build-clang-18/dev/libseastar.so+0x32fbba
  /home/tdowns/dev/seastar/build-clang-18/dev/libseastar.so+0x342d0b
  /home/tdowns/dev/seastar/build-clang-18/dev/libseastar.so+0x35cf6c
  /lib/x86_64-linux-gnu/libc.so.6+0x4251f
  /lib/x86_64-linux-gnu/libgcc_s.so.1+0x17585
  /lib/x86_64-linux-gnu/libc.so.6+0x133bb2
  /home/tdowns/dev/seastar/build-clang-18/dev/libseastar.so+0x36d0d7
  /home/tdowns/dev/seastar/build-clang-18/dev/libseastar.so+0x32fb84
  /home/tdowns/dev/seastar/build-clang-18/dev/libseastar.so+0x32f0e7
  /home/tdowns/dev/seastar/build-clang-18/dev/libseastar.so+0x330281
  /lib/x86_64-linux-gnu/libc.so.6+0x4251f
  /lib/x86_64-linux-gnu/libgcc_s.so.1+0x14789
  /lib/x86_64-linux-gnu/libgcc_s.so.1+0x16f54
  /home/tdowns/dev/seastar/build-clang-18/dev/libseastar.so+0x2b98a8
  /lib/x86_64-linux-gnu/libgcc_s.so.1+0x173b8

Fixes #2710.

@@ -4042,18 +4061,20 @@ static void sigsegv_action(siginfo_t *info, ucontext_t* uc) noexcept {
// with addr2line and other tools which expect an
// address without any added offset (from ASLR or
// because it's a relocated shared object).
print_safe("Segmentation fault: resolved ip: ");
print_safe("Segmentation fault: resolved ip: 0x");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it clear these are hex.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, but it doesn't fit the "Print incrementally in sigsegv handler" patch title

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Busted. Would you like me to split it out into a new commit or extend the title/commit description?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate commit (PR?) would be better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xemul I have split this out into a separate PR as requested: #2727

@travisdowns travisdowns force-pushed the td-2710-segfault-incremental branch 3 times, most recently from f93e247 to f367b4d Compare April 1, 2025 16:40
memcpy(_buf + _pos, str, len);
_pos += len;
if (_immediate) {
print_safe(str, len);
Copy link
Contributor Author

@travisdowns travisdowns Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the key change for immediate output in the buffer class: all appends come though here and in immediate mode we just print the buffer immediately, then flush and reserve are just no-ops in immediate node.

func(decorate(ip - 1));

if (incremental) {
for (size_t last_frame = 1; last_frame <= max_backtrace; ++last_frame) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, in the cases we've seen this the crash happens fairly far down the stack, and the non-crashing frames are enough to diagnose or go a long way to understanding the problem (e.g., in the stall detector case the full stall detector backtrace appears).

}

void append_backtrace_oneline() noexcept {
backtrace([this] (frame f) noexcept {
reserve(3 + sizeof(f.addr) * 2);
append(" 0x");
append_hex(f.addr);
});
}, _immediate);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking out loud. This transparently converts "immediate" to "incremental". Oh, well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead, introduce an "immediate" mode to backtrace buffer which prints all appended strings immediately

Hopefully it won't intermix these lines with more logs (from other shards?) thus making the whole thing unreadable. On the other hand, SIGSEGV shouldn't be easy to debug, and having intermixed lines is still better than having nothing at all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking out loud. This transparently converts "immediate" to "incremental". Oh, well

Yeah. I felt that these were the right names for the two concepts, as they are not the same and could in principle be set independently, but here in the only existing usage "immediate" directly implies "incremetal". I could imagine a future where incremental would be a separate option inside the backtrace buffer in case you wanted say "non-immediate, incremental" or whatever.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully it won't intermix these lines with more logs (from other shards?) thus making the whole thing unreadable.

Yeah. On the one hand it's possible, on the other hand it will hopefully be rare as we do make all these calls back-to-back with no delay in between so the window to race with concurrent output on another shard is small so I expect tearing to be rare in practice, and in on the third hand:

On the other hand, SIGSEGV shouldn't be easy to debug, and having intermixed lines is still better than having nothing at all

Yup, exactly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, this is one reason to do this only on the segfault path, not on the other paths (abort handler, reactor stall notifier, etc).

If we crash due to segfault, we want to print the backtrace
incrementally, frame-by-frame, rather than all at once. This is because
the backtrace call may crash at some problematic frame, in which case we
get no output at all: these are very hard to diagnose.

Instead, introduce an "immediate" mode to backtrace buffer which prints
all appended strings immediately, and add an incremental mode to the
backtrace function which calls ::backtrace repeatedly, increasing the
number of requested frames by 1 each time. If we crash at some frame N
we will have printed all frames 1..N-1 already.

Before this change the backtrace looks like this using the reproducer
in scylladb#2697:

```
Backtrace:
```

After it looks like:

```
Backtrace:
  /home/tdowns/dev/seastar/build-clang-18/dev/libseastar.so+0x36ce6a
  /home/tdowns/dev/seastar/build-clang-18/dev/libseastar.so+0x32fbba
  /home/tdowns/dev/seastar/build-clang-18/dev/libseastar.so+0x342d0b
  /home/tdowns/dev/seastar/build-clang-18/dev/libseastar.so+0x35cf6c
  /lib/x86_64-linux-gnu/libc.so.6+0x4251f
  /lib/x86_64-linux-gnu/libgcc_s.so.1+0x17585
  /lib/x86_64-linux-gnu/libc.so.6+0x133bb2
  /home/tdowns/dev/seastar/build-clang-18/dev/libseastar.so+0x36d0d7
  /home/tdowns/dev/seastar/build-clang-18/dev/libseastar.so+0x32fb84
  /home/tdowns/dev/seastar/build-clang-18/dev/libseastar.so+0x32f0e7
  /home/tdowns/dev/seastar/build-clang-18/dev/libseastar.so+0x330281
  /lib/x86_64-linux-gnu/libc.so.6+0x4251f
  /lib/x86_64-linux-gnu/libgcc_s.so.1+0x14789
  /lib/x86_64-linux-gnu/libgcc_s.so.1+0x16f54
  /home/tdowns/dev/seastar/build-clang-18/dev/libseastar.so+0x2b98a8
  /lib/x86_64-linux-gnu/libgcc_s.so.1+0x173b8
```

Fixes scylladb#2710.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

segault logging could be more robust
2 participants