feat: Enable hang reporting via Crashpad for 3P#10770
Conversation
This PR is meant as a proof of concept but is really not too far from production ready code, so we may consider trying to land it. There are a couple key differences vs. the approach used by 1P ATV: * The CrashHandler Starboard extension is used as a bridge to enable the hermetically built Cobalt layer to request that Crashpad generate a dump without crashing. This is required because the Crashpad client library is not linked into libcobalt. The Starboard extension is extended to provide this additional functionality. * Because Evergreen builds do not support the crash key system from Chromium, the CrashHandler Starboard extension is used to pass the critical list-of-hung-threads annotation from the Cobalt layer to the Crashpad handler. Issue: 513671861
🤖 Gemini Suggested Commit Message💡 Pro Tips for a Better Commit Message:
|
There was a problem hiding this comment.
Code Review
This pull request adds support for triggering non-crashing dump reports (DumpWithoutCrashing) on Starboard/Evergreen builds by extending the CobaltExtensionCrashHandlerApi and integrating it with the HangWatcher and Crashpad wrapper. Feedback on the changes includes: updating the DumpWithoutCrashing extension API signature to return bool to prevent compilation errors with Chromium's SetDumpWithoutCrashingFunction; adding a null check for SetString in hang_watcher.cc to prevent potential crashes; and invoking the g_dump_without_crashing_callback outside of the mutex lock in crash_handler.cc to avoid potential deadlocks during dump generation.
| // Triggers a non-crashing dump/hang report. | ||
| void (*DumpWithoutCrashing)(); |
There was a problem hiding this comment.
The DumpWithoutCrashing function pointer is currently defined to return void. However, base::debug::SetDumpWithoutCrashingFunction in Chromium's base/debug/dump_without_crashing.h expects a function pointer with the signature bool (*)(). Passing a void (*)() function pointer directly will result in a compilation error. Updating the extension API signature to return bool (indicating whether the dump was successfully triggered) resolves this type mismatch cleanly.
| // Triggers a non-crashing dump/hang report. | |
| void (*DumpWithoutCrashing)(); | |
| // Triggers a non-crashing dump/hang report. | |
| // Returns true if the dump request was successfully triggered. | |
| bool (*DumpWithoutCrashing)(); |
References
- Adhere to the established upstream Chromium style guides for C++ as specified in the repository style guide. (link)
There was a problem hiding this comment.
This comment seems incorrect.
From https://github.com/youtube/cobalt/blob/main/base/debug/dump_without_crashing.h#L54:
BASE_EXPORT void SetDumpWithoutCrashingFunction(void (*function)());
| auto* crash_handler_extension = | ||
| static_cast<const CobaltExtensionCrashHandlerApi*>( | ||
| SbSystemGetExtension(kCobaltExtensionCrashHandlerName)); | ||
| if (crash_handler_extension && crash_handler_extension->version >= 2) { | ||
| crash_handler_extension->SetString("list-of-hung-threads", | ||
| list_of_hung_thread_ids.c_str()); | ||
| // TODO: b/513671861 - Cobalt: consider adding support for other crash keys. | ||
| } |
There was a problem hiding this comment.
The SetString function pointer is called without checking if it is null. To prevent potential crashes if an implementation of the extension has a null SetString pointer, please add a null check for crash_handler_extension->SetString before calling it.
auto* crash_handler_extension =
static_cast<const CobaltExtensionCrashHandlerApi*>(
SbSystemGetExtension(kCobaltExtensionCrashHandlerName));
if (crash_handler_extension && crash_handler_extension->version >= 2 &&
crash_handler_extension->SetString) {
crash_handler_extension->SetString("list-of-hung-threads",
list_of_hung_thread_ids.c_str());
// TODO: b/513671861 - Cobalt: consider adding support for other crash keys.
}References
- Adhere to the established upstream Chromium style guides for C++ as specified in the repository style guide. (link)
| void DumpWithoutCrashing() { | ||
| SB_CHECK_EQ(pthread_mutex_lock(&g_dump_without_crashing_callback_mutex), 0); | ||
| if (g_dump_without_crashing_callback) { | ||
| g_dump_without_crashing_callback(); | ||
| } | ||
| SB_CHECK_EQ(pthread_mutex_unlock(&g_dump_without_crashing_callback_mutex), 0); | ||
| } |
There was a problem hiding this comment.
Invoking the callback g_dump_without_crashing_callback while holding g_dump_without_crashing_callback_mutex can lead to deadlocks or block other threads unnecessarily, especially since generating a minidump can be a slow operation. It is safer to copy the callback pointer under the lock, release the lock, and then invoke the callback. Additionally, updating this function to return bool aligns with the updated signature of DumpWithoutCrashing in the extension API.
bool DumpWithoutCrashing() {
void (*callback)() = nullptr;
SB_CHECK_EQ(pthread_mutex_lock(&g_dump_without_crashing_callback_mutex), 0);
callback = g_dump_without_crashing_callback;
SB_CHECK_EQ(pthread_mutex_unlock(&g_dump_without_crashing_callback_mutex), 0);
if (callback) {
callback();
return true;
}
return false;
}References
- Adhere to the established upstream Chromium style guides for C++ as specified in the repository style guide. (link)
This PR is meant as a proof of concept but is really not too far from production ready code, so we may consider trying to land it.
There are a couple key differences vs. the approach used by 1P ATV:
#vibe-coded
Issue: 513671861