Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle processes whose main thread has exited #376

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

christos68k
Copy link
Member

@christos68k christos68k commented Feb 27, 2025

Summary

This PR implements both steps described in #365 (comment).

Thanks to @korniltsev for suggesting disassociate_ctty, I ended up using another tracepoint sched_process_free instead as it makes fewer assumptions and is more stable (see this comment for more context). It also allows us to simplify cleanup logic (no need for the extra periodic cleanups I had in the first prototype solution), as userspace will get a final PID notification when the process gets freed by the kernel.

Essentially, whenever the main thread exits, we do not unload process information thus allowing profiling the remaining threads to continue. Processmanager can also track mapping changes triggered by one of the remaining threads.

I added some debug warning statements to ease review, I will remove the commit that introduced them before merging. I also added a C program that you can compile and run as a testing workload with the profiling agent also running, that should exercise all the corner cases that this PR addresses. Looking at the warning logs I added and the generated flamegraph in devfiler should make the timeline of processmanager operations very clear.

It's probably easier to review this commit-by-commit.

TODO:

  • DONE Add test program
  • More testing

@christos68k christos68k marked this pull request as draft February 27, 2025 22:07
@korniltsev
Copy link
Contributor

Thanks for looking into this.

This looks OK overall and should solve the issue from the user perspective.

One of the downsides I see is that while we do not unload the old mappings, we re also not loading new mappings, which may degrade profiling of such processes ( I am still not sure if there are legit applications with dead main thread, or is it a highly infrequent corner case)

I personally would prefer if the processmanager "re-elected" a main thread by looking into the process threads, although I realize it may require more work and we may do this later.

Another thing to consider is to hook a kprobe on disassociate_ctty which is called when the process group is dead
https://github.com/torvalds/linux/blame/master/kernel/exit.c#L935-L936 this may help avoiding a separate timer for this case.

It would be nice to have a unit test for this case regardless of the solution we chose.

@christos68k
Copy link
Member Author

christos68k commented Mar 4, 2025

One of the downsides I see is that while we do not unload the old mappings, we re also not loading new mappings, which may degrade profiling of such processes ( I am still not sure if there are legit applications with dead main thread, or is it a highly infrequent corner case)
I personally would prefer if the processmanager "re-elected" a main thread by looking into the process threads, although I realize it may require more work and we may do this later.

I'm currently working on this, will push new commits (implementing part 2 of the proposed solution in #365) today.

Another thing to consider is to hook a kprobe on disassociate_ctty which is called when the process group is dead https://github.com/torvalds/linux/blame/master/kernel/exit.c#L935-L936 this may help avoiding a separate timer for this case.

I think we can switch to sched_process_free tracepoint (instead of sched_process_exit) which should be more performant than a kprobe. I'll verify.

EDIT: sched_process_free fires for every kernel task so it's not suitable if we want to avoid notifying userspace of every thread exit. On the other hand, disassociate_ctty seemingly (no in-depth investigation done on my part) does what we want and also seemingly executes after task has been removed from /proc which eliminates a possible race in userspace that would otherwise be a (probably unlikely) concern.

EDIT2 Went back to sched_process_free which we can make work by checking whether PID is something we track or not.

sched_process_free is called when the task is freed by the kernel,
which allows for simpler cleanup of processes whose main thread
has exited.
Making TID available to processmanager allows the agent to keep
profiling a process whose main thread calls pthread_exit while
other threads continue to run.
This allows the agent to continue profiling a process whose main
thread has exited, but other threads continue to run. Mapping changes
triggered by one of the remaining threads are also tracked.
@christos68k christos68k marked this pull request as ready for review March 20, 2025 18:53
} else if path != "" {
// Ignore [vsyscall] and similar executable kernel
// pages we don't care about
} else {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No semantic change, I just inlined the logic from GetMappings here as this is the more appropriate place.

@@ -538,7 +538,7 @@ func (pm *ProcessManager) synchronizeMappings(pr process.Process,
// fast enough and this particular pid is reused again by the system.
func (pm *ProcessManager) processPIDExit(pid libpf.PID) {
exitKTime := times.GetKTime()
log.Debugf("- PID: %v", pid)
log.Warnf("- PID: %v", pid)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove these newly added warnings before merging, they should help with reviewing the PR as you don't need to run the agent with debug logs enabled and sort through a lot of irrelevant noise.

@@ -626,22 +633,7 @@ func (pm *ProcessManager) SynchronizeProcess(pr process.Process) {
// return ESRCH. Handle it as if the process did not exist.
pm.mappingStats.errProcESRCH.Add(1)
}
return
}
if len(mappings) == 0 {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments are no longer relevant.

@christos68k
Copy link
Member Author

christos68k commented Mar 20, 2025

I added some more information and notes on how to review/test to the description.

@korniltsev please take another look and review/test.

@korniltsev
Copy link
Contributor

Great job. Thank you for looking into this.
I like the trick with sched_process_free and that we have no extra timers in userspace and the logic of the PM did not complicate.
I've run both my repro and your repro with libcrypto and the profiler works as expected. It keeps profiling remaining threads including new libraries (libcrypto)
I wish we could somehow create an integration test for it from the repro you've added so that it is run with every testruns instead of hoping I don't forget to run it. But I understand writing a test may be hard / time consuming so we may do this later.
LGTM

tracer/tracer.go Outdated
// It needs to be buffered to avoid locking the writers and stacking up resources when we
// read new PIDs at startup or notified via eBPF.
pidEvents chan libpf.PID
pidEvents chan uint64
Copy link
Contributor

@florianl florianl Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should a new type, maybe libpf.PidTidg, be used here to make clear that these are not ordinary uint64 numbers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added libpf.PIDTID

@@ -53,9 +58,10 @@ func init() {
}

// New returns an object with Process interface accessing it
func New(pid libpf.PID) Process {
func New(pid, tid libpf.PID) Process {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't switch Process to accept libpf.PIDTID as the latter is only used with PID events, and I'd rather not couple it here too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Profiler incorrectly handles process exit when non-main threads are still running
4 participants