-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Print incrementally in sigsegv handler #2712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -766,8 +766,11 @@ class backtrace_buffer { | |
static constexpr unsigned _max_size = 8 << 10; | ||
unsigned _pos = 0; | ||
char _buf[_max_size]; | ||
bool _immediate; | ||
public: | ||
backtrace_buffer() = default; | ||
backtrace_buffer(bool immediate = false) | ||
: _immediate{immediate} {} | ||
|
||
~backtrace_buffer() { | ||
flush(); | ||
} | ||
|
@@ -779,23 +782,30 @@ class backtrace_buffer { | |
backtrace_buffer &operator = (backtrace_buffer &&) = delete; | ||
|
||
void flush() noexcept { | ||
if (_pos > 0) { | ||
if (!_immediate && _pos > 0) { | ||
print_safe(_buf, _pos); | ||
_pos = 0; | ||
} | ||
} | ||
|
||
void reserve(size_t len) noexcept { | ||
if (_immediate) { | ||
return; | ||
} | ||
SEASTAR_ASSERT(len < _max_size); | ||
if (_pos + len >= _max_size) { | ||
flush(); | ||
} | ||
} | ||
|
||
void append(const char* str, size_t len) noexcept { | ||
reserve(len); | ||
memcpy(_buf + _pos, str, len); | ||
_pos += len; | ||
if (_immediate) { | ||
print_safe(str, len); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the key change for immediate output in the buffer class: all appends come though here and in immediate mode we just print the buffer immediately, then flush and reserve are just no-ops in immediate node. |
||
} else { | ||
reserve(len); | ||
memcpy(_buf + _pos, str, len); | ||
_pos += len; | ||
} | ||
} | ||
|
||
void append(const char* str) noexcept { append(str, strlen(str)); } | ||
|
@@ -825,15 +835,15 @@ class backtrace_buffer { | |
append("0x"); | ||
append_hex(f.addr); | ||
append("\n"); | ||
}); | ||
}, _immediate); | ||
} | ||
|
||
void append_backtrace_oneline() noexcept { | ||
backtrace([this] (frame f) noexcept { | ||
reserve(3 + sizeof(f.addr) * 2); | ||
append(" 0x"); | ||
append_hex(f.addr); | ||
}); | ||
}, _immediate); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thinking out loud. This transparently converts "immediate" to "incremental". Oh, well There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Hopefully it won't intermix these lines with more logs (from other shards?) thus making the whole thing unreadable. On the other hand, SIGSEGV shouldn't be easy to debug, and having intermixed lines is still better than having nothing at all There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yeah. I felt that these were the right names for the two concepts, as they are not the same and could in principle be set independently, but here in the only existing usage "immediate" directly implies "incremetal". I could imagine a future where incremental would be a separate option inside the backtrace buffer in case you wanted say "non-immediate, incremental" or whatever. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yeah. On the one hand it's possible, on the other hand it will hopefully be rare as we do make all these calls back-to-back with no delay in between so the window to race with concurrent output on another shard is small so I expect tearing to be rare in practice, and in on the third hand:
Yup, exactly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. BTW, this is one reason to do this only on the segfault path, not on the other paths (abort handler, reactor stall notifier, etc). |
||
} | ||
}; | ||
|
||
|
@@ -856,8 +866,17 @@ static void print_with_backtrace(backtrace_buffer& buf, bool oneline) noexcept { | |
} | ||
} | ||
|
||
static void print_with_backtrace(const char* cause, bool oneline = false) noexcept { | ||
backtrace_buffer buf; | ||
// Print the current backtrace to stdout with the given cause. | ||
// If oneline is true, backtrace is printed entirely on one line, | ||
// otherwise it is printed with 1 line per frame. | ||
// If immediate is true, the backtrace is printed frame by frame | ||
// with a call to write(2), otherwise it is printed in a single | ||
// call to write(2). The former strategy is more robust in cases | ||
// where the backtrace itself may crash at some point down the stack | ||
// while the latter is more efficient and avoids splitting output | ||
// in the face of concurrent logging by other shards. | ||
static void print_with_backtrace(const char* cause, bool oneline = false, bool immediate = false) noexcept { | ||
backtrace_buffer buf(immediate); | ||
buf.append(cause); | ||
print_with_backtrace(buf, oneline); | ||
} | ||
|
@@ -4053,7 +4072,9 @@ static void sigsegv_action(siginfo_t *info, ucontext_t* uc) noexcept { | |
print_zero_padded_hex_safe(f.so->end - f.so->begin); | ||
print_safe("]\n"); | ||
|
||
print_with_backtrace("Segmentation fault"); | ||
// print the backtrace in immediate mode, so if we crash | ||
// during the backtrace we get as much output as possible | ||
print_with_backtrace("Segmentation fault", false, true); | ||
reraise_signal(SIGSEGV); | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice, in the cases we've seen this the crash happens fairly far down the stack, and the non-crashing frames are enough to diagnose or go a long way to understanding the problem (e.g., in the stall detector case the full stall detector backtrace appears).