Skip to content

Commit abe7838

Browse files
authored
[crashtracker/profiler] Notify profiler the application is crashing (#7657)
## Summary of changes Nth attempt to prevent the profiler from collecting callstacks when the application is crashing. ## Reason for change We have this issue for a while and this appeared in some crash reports lately. The profiler is trying to collect callstacks while the app is crashing. This can lead to a crash on a crash and appearance of zombie processes. ``` syscall (sysdeps/unix/sysv/linux/x86_64/syscall.S:38) _write_validate (/project/obj/libunwind-prefix/src/libunwind/src/mi/Gaddress_validator.c:140) access_mem (/project/obj/libunwind-prefix/src/libunwind/src/x86_64/Ginit.c:89) dwarf_get (/project/obj/libunwind-prefix/src/libunwind/include/tdep-x86_64/libunwind_i.h:205) unw_backtrace2 (/project/obj/libunwind-prefix/src/libunwind/src/mi/backtrace.c:111) LinuxStackFramesCollector::CollectStackWithBacktrace2(void*) (/project/profiler/src/ProfilerEngine/Datadog.Profiler.Native.Linux/LinuxStackFramesCollector.cpp:285) LinuxStackFramesCollector::CollectStackSampleSignalHandler(int, siginfo_t*, void*) (/project/profiler/src/ProfilerEngine/Datadog.Profiler.Native.Linux/LinuxStackFramesCollector.cpp) ProfilerSignalManager::SignalHandler(int, siginfo_t*, void*) (/project/profiler/src/ProfilerEngine/Datadog.Profiler.Native.Linux/ProfilerSignalManager.cpp:183) __restore_rt __GI___wait4 (sysdeps/unix/sysv/linux/wait4.c:30) PROCCreateCrashDump(std::vector<char const*, std::allocator<char const*> >&, char*, int, bool) (/__w/1/s/src/coreclr/pal/src/thread/process.cpp:2545) PROCCreateCrashDumpIfEnabled (/__w/1/s/src/coreclr/pal/src/thread/process.cpp) PROCAbort (/__w/1/s/src/coreclr/pal/src/thread/process.cpp:2793) PROCEndProcess(void*, unsigned int, int) (/__w/1/s/src/coreclr/pal/src/thread/process.cpp:1355) UnwindManagedExceptionPass1(PAL_SEHException&, _CONTEXT*) (/__w/1/s/src/coreclr/vm/exceptionhandling.cpp) DispatchManagedException(PAL_SEHException&, bool) (/__w/1/s/src/coreclr/vm/exceptionhandling.cpp) IL_Throw(Object*) (/__w/1/s/src/coreclr/vm/jithelpers.cpp) ``` When the application is crashing, the .NET runtime calls `fork()`. Calling this function leads to the creation of a new process (child process). This process will have a copy of the parent address space. Then the .NET runtime, in the child process, calls `execve()` to run `createdump`. We wrapped `execve()` in order to call our own tool which will create a crash report and send it to our backend. In our implementation of `execve()`, we set a flag to notify the profiler to not collect callstack because the application is crashing. The current implementation does not work because we use an `atomic_int` which will be copied and when the child process set flag, it will be its own version of the flag. The parent process won't be able to see it changed. We need a shared memory so the parent and child processes can communicate. ## Implementation details - create a shared memory region (using mmap) ## Test coverage We should not see issue caused by the profiler in our CI when crashes happen
1 parent ad0a291 commit abe7838

File tree

1 file changed

+30
-5
lines changed

1 file changed

+30
-5
lines changed

profiler/src/ProfilerEngine/Datadog.Linux.ApiWrapper/functions_to_wrap.c

Lines changed: 30 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,15 @@
11
#define _GNU_SOURCE
22
#include <dlfcn.h>
33
#include <link.h>
4+
#include <pthread.h>
45
#include <signal.h>
56
#include <stddef.h>
67
#include <stdio.h>
78
#include <string.h>
9+
#include <stdatomic.h>
810
#include <stdlib.h>
11+
#include <sys/mman.h>
912
#include <unistd.h>
10-
#include <stdatomic.h>
11-
#include <pthread.h>
1213

1314
#include "common.h"
1415

@@ -58,13 +59,28 @@ enum FUNCTION_ID
5859
// counters: one byte per function
5960
__thread unsigned long long functions_entered_counter = 0;
6061

62+
// This variable is used to indicate to the profiler that the application is crashing
63+
// and it should not collect samples while the app is crashing.
64+
// At crash time, the .NET runtime calls fork() to create a child process which will
65+
// be in charge of collecting a crash dump (by calling execve()) while the parent is waiting
66+
// for the child to finish.
67+
// By calling fork(), the child process and the parent process will have their own address space,
68+
// which means that the child process won't be able to modify the parent process's variables.
69+
// We need a way to enable communication between the child and parent processes.
70+
// This is done by creating a shared memory region and use it as a flag to indicate that
71+
// the application is crashing.
72+
// This variable will be a pointer to that shared memory region.
6173
__attribute__((visibility("hidden")))
62-
atomic_int is_app_crashing = 0;
74+
int* is_app_crashing = NULL;
6375

6476
// this function is called by the profiler
6577
unsigned long long dd_inside_wrapped_functions()
6678
{
67-
return functions_entered_counter + is_app_crashing;
79+
int app_is_crashing = 0;
80+
if (is_app_crashing != NULL) {
81+
app_is_crashing = *is_app_crashing;
82+
}
83+
return functions_entered_counter + app_is_crashing;
6884
}
6985

7086
#if defined(__aarch64__)
@@ -467,7 +483,9 @@ int execve(const char* pathname, char* const argv[], char* const envp[])
467483
return __real_execve(pathname, argv, envp);
468484
}
469485

470-
is_app_crashing = 1;
486+
if (is_app_crashing != NULL) {
487+
*is_app_crashing = 1;
488+
}
471489
// Execute the alternative crash handler, and prepend "createdump" to the arguments
472490

473491
// Count the number of arguments (the list ends with a null pointer)
@@ -671,6 +689,13 @@ static void init()
671689
__real_pthread_setattr_default_np = __dd_dlsym(RTLD_NEXT, "pthread_setattr_default_np");
672690
//__real_fork = __dd_dlsym(RTLD_NEXT, "fork");
673691
#endif
692+
// if we failed at allocating memory for the shared variable
693+
// the parent process won't be notified that the app is crashing.
694+
is_app_crashing = mmap(NULL, sizeof(int), PROT_READ | PROT_WRITE,
695+
MAP_SHARED | MAP_ANONYMOUS, -1, 0);
696+
if (is_app_crashing != MAP_FAILED) {
697+
*is_app_crashing = 0; // Initialize flag
698+
}
674699
}
675700

676701
static void check_init()

0 commit comments

Comments
 (0)