Skip to content

Print incrementally in sigsegv handler #2712

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 25 additions & 5 deletions include/seastar/util/backtrace.hh
Original file line number Diff line number Diff line change
Expand Up @@ -61,16 +61,36 @@ bool operator==(const frame& a, const frame& b) noexcept;
frame decorate(uintptr_t addr) noexcept;

// Invokes func for each frame passing it as argument.
// incremental=false is the default mode and simply calls ::backtrace once
// and then calls func for each frame. If the ::backtrace call crashes,
// func will not be called.
// incremental=true will call ::backtrace in a loop, increasing the number of
// frames to capture by one each time. This is useful for cases where
// ::backtrace itself may crash, e.g., in a crash handler, in this case the
// func will be called for each frame which didn't crash. See #2710. This
// is much slower, however.
SEASTAR_MODULE_EXPORT
template<typename Func>
void backtrace(Func&& func) noexcept(noexcept(func(frame()))) {
void backtrace(Func&& func, bool incremental = false) noexcept(noexcept(func(frame()))) {
#ifdef HAVE_EXECINFO
constexpr size_t max_backtrace = 100;
void* buffer[max_backtrace];
int n = ::backtrace(buffer, max_backtrace);
for (int i = 0; i < n; ++i) {
auto ip = reinterpret_cast<uintptr_t>(buffer[i]);
func(decorate(ip - 1));

if (incremental) {
for (size_t last_frame = 1; last_frame <= max_backtrace; ++last_frame) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, in the cases we've seen this the crash happens fairly far down the stack, and the non-crashing frames are enough to diagnose or go a long way to understanding the problem (e.g., in the stall detector case the full stall detector backtrace appears).

int n = ::backtrace(buffer, last_frame);
if (n < static_cast<int>(last_frame)) {
return;
}
auto ip = reinterpret_cast<uintptr_t>(buffer[last_frame-1]);
func(decorate(ip - 1));
}
} else {
int n = ::backtrace(buffer, max_backtrace);
for (int i = 0; i < n; ++i) {
auto ip = reinterpret_cast<uintptr_t>(buffer[i]);
func(decorate(ip - 1));
}
}
#else
// Not implemented yet
Expand Down
41 changes: 31 additions & 10 deletions src/core/reactor.cc
Original file line number Diff line number Diff line change
Expand Up @@ -766,8 +766,11 @@ class backtrace_buffer {
static constexpr unsigned _max_size = 8 << 10;
unsigned _pos = 0;
char _buf[_max_size];
bool _immediate;
public:
backtrace_buffer() = default;
backtrace_buffer(bool immediate = false)
: _immediate{immediate} {}

~backtrace_buffer() {
flush();
}
Expand All @@ -779,23 +782,30 @@ class backtrace_buffer {
backtrace_buffer &operator = (backtrace_buffer &&) = delete;

void flush() noexcept {
if (_pos > 0) {
if (!_immediate && _pos > 0) {
print_safe(_buf, _pos);
_pos = 0;
}
}

void reserve(size_t len) noexcept {
if (_immediate) {
return;
}
SEASTAR_ASSERT(len < _max_size);
if (_pos + len >= _max_size) {
flush();
}
}

void append(const char* str, size_t len) noexcept {
reserve(len);
memcpy(_buf + _pos, str, len);
_pos += len;
if (_immediate) {
print_safe(str, len);
Copy link
Contributor Author

@travisdowns travisdowns Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the key change for immediate output in the buffer class: all appends come though here and in immediate mode we just print the buffer immediately, then flush and reserve are just no-ops in immediate node.

} else {
reserve(len);
memcpy(_buf + _pos, str, len);
_pos += len;
}
}

void append(const char* str) noexcept { append(str, strlen(str)); }
Expand Down Expand Up @@ -825,15 +835,15 @@ class backtrace_buffer {
append("0x");
append_hex(f.addr);
append("\n");
});
}, _immediate);
}

void append_backtrace_oneline() noexcept {
backtrace([this] (frame f) noexcept {
reserve(3 + sizeof(f.addr) * 2);
append(" 0x");
append_hex(f.addr);
});
}, _immediate);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking out loud. This transparently converts "immediate" to "incremental". Oh, well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead, introduce an "immediate" mode to backtrace buffer which prints all appended strings immediately

Hopefully it won't intermix these lines with more logs (from other shards?) thus making the whole thing unreadable. On the other hand, SIGSEGV shouldn't be easy to debug, and having intermixed lines is still better than having nothing at all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking out loud. This transparently converts "immediate" to "incremental". Oh, well

Yeah. I felt that these were the right names for the two concepts, as they are not the same and could in principle be set independently, but here in the only existing usage "immediate" directly implies "incremetal". I could imagine a future where incremental would be a separate option inside the backtrace buffer in case you wanted say "non-immediate, incremental" or whatever.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully it won't intermix these lines with more logs (from other shards?) thus making the whole thing unreadable.

Yeah. On the one hand it's possible, on the other hand it will hopefully be rare as we do make all these calls back-to-back with no delay in between so the window to race with concurrent output on another shard is small so I expect tearing to be rare in practice, and in on the third hand:

On the other hand, SIGSEGV shouldn't be easy to debug, and having intermixed lines is still better than having nothing at all

Yup, exactly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, this is one reason to do this only on the segfault path, not on the other paths (abort handler, reactor stall notifier, etc).

}
};

Expand All @@ -856,8 +866,17 @@ static void print_with_backtrace(backtrace_buffer& buf, bool oneline) noexcept {
}
}

static void print_with_backtrace(const char* cause, bool oneline = false) noexcept {
backtrace_buffer buf;
// Print the current backtrace to stdout with the given cause.
// If oneline is true, backtrace is printed entirely on one line,
// otherwise it is printed with 1 line per frame.
// If immediate is true, the backtrace is printed frame by frame
// with a call to write(2), otherwise it is printed in a single
// call to write(2). The former strategy is more robust in cases
// where the backtrace itself may crash at some point down the stack
// while the latter is more efficient and avoids splitting output
// in the face of concurrent logging by other shards.
static void print_with_backtrace(const char* cause, bool oneline = false, bool immediate = false) noexcept {
backtrace_buffer buf(immediate);
buf.append(cause);
print_with_backtrace(buf, oneline);
}
Expand Down Expand Up @@ -4053,7 +4072,9 @@ static void sigsegv_action(siginfo_t *info, ucontext_t* uc) noexcept {
print_zero_padded_hex_safe(f.so->end - f.so->begin);
print_safe("]\n");

print_with_backtrace("Segmentation fault");
// print the backtrace in immediate mode, so if we crash
// during the backtrace we get as much output as possible
print_with_backtrace("Segmentation fault", false, true);
reraise_signal(SIGSEGV);
}

Expand Down