Skip to content

Handle processes whose main thread has exited #376

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Apr 25, 2025
Merged

Conversation

christos68k
Copy link
Member

@christos68k christos68k commented Feb 27, 2025

Summary

This PR implements both steps described in #365 (comment).

Thanks to @korniltsev for suggesting disassociate_ctty, I ended up using another tracepoint sched_process_free instead as it makes fewer assumptions and is more stable (see this comment for more context). It also allows us to simplify cleanup logic (no need for the extra periodic cleanups I had in the first prototype solution), as userspace will get a final PID notification when the process gets freed by the kernel.

Essentially, whenever the main thread exits, we do not unload process information thus allowing profiling the remaining threads to continue. Processmanager can also track mapping changes triggered by one of the remaining threads.

I added some debug warning statements to ease review, I will remove the commit that introduced them before merging. I also added a C program that you can compile and run as a testing workload with the profiling agent also running, that should exercise all the corner cases that this PR addresses. Looking at the warning logs I added and the generated flamegraph in devfiler should make the timeline of processmanager operations very clear.

It's probably easier to review this commit-by-commit.

TODO:

  • DONE Add test program
  • More testing

@christos68k christos68k marked this pull request as draft February 27, 2025 22:07
@korniltsev
Copy link
Contributor

Thanks for looking into this.

This looks OK overall and should solve the issue from the user perspective.

One of the downsides I see is that while we do not unload the old mappings, we re also not loading new mappings, which may degrade profiling of such processes ( I am still not sure if there are legit applications with dead main thread, or is it a highly infrequent corner case)

I personally would prefer if the processmanager "re-elected" a main thread by looking into the process threads, although I realize it may require more work and we may do this later.

Another thing to consider is to hook a kprobe on disassociate_ctty which is called when the process group is dead
https://github.com/torvalds/linux/blame/master/kernel/exit.c#L935-L936 this may help avoiding a separate timer for this case.

It would be nice to have a unit test for this case regardless of the solution we chose.

@christos68k
Copy link
Member Author

christos68k commented Mar 4, 2025

One of the downsides I see is that while we do not unload the old mappings, we re also not loading new mappings, which may degrade profiling of such processes ( I am still not sure if there are legit applications with dead main thread, or is it a highly infrequent corner case)
I personally would prefer if the processmanager "re-elected" a main thread by looking into the process threads, although I realize it may require more work and we may do this later.

I'm currently working on this, will push new commits (implementing part 2 of the proposed solution in #365) today.

Another thing to consider is to hook a kprobe on disassociate_ctty which is called when the process group is dead https://github.com/torvalds/linux/blame/master/kernel/exit.c#L935-L936 this may help avoiding a separate timer for this case.

I think we can switch to sched_process_free tracepoint (instead of sched_process_exit) which should be more performant than a kprobe. I'll verify.

EDIT: sched_process_free fires for every kernel task so it's not suitable if we want to avoid notifying userspace of every thread exit. On the other hand, disassociate_ctty seemingly (no in-depth investigation done on my part) does what we want and also seemingly executes after task has been removed from /proc which eliminates a possible race in userspace that would otherwise be a (probably unlikely) concern.

EDIT2: Went back to sched_process_free which we can make work by checking whether PID is something we track or not.

@christos68k christos68k force-pushed the ck/process-exit branch 3 times, most recently from e11a0dc to 87e351e Compare March 20, 2025 18:46
@christos68k christos68k marked this pull request as ready for review March 20, 2025 18:53
} else if path != "" {
// Ignore [vsyscall] and similar executable kernel
// pages we don't care about
} else {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No semantic change, I just inlined the logic from GetMappings here as this is the more appropriate place.

@@ -538,7 +538,7 @@ func (pm *ProcessManager) synchronizeMappings(pr process.Process,
// fast enough and this particular pid is reused again by the system.
func (pm *ProcessManager) processPIDExit(pid libpf.PID) {
exitKTime := times.GetKTime()
log.Debugf("- PID: %v", pid)
log.Warnf("- PID: %v", pid)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove these newly added warnings before merging, they should help with reviewing the PR as you don't need to run the agent with debug logs enabled and sort through a lot of irrelevant noise.

@@ -626,22 +633,7 @@ func (pm *ProcessManager) SynchronizeProcess(pr process.Process) {
// return ESRCH. Handle it as if the process did not exist.
pm.mappingStats.errProcESRCH.Add(1)
}
return
}
if len(mappings) == 0 {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments are no longer relevant.

@christos68k
Copy link
Member Author

christos68k commented Mar 20, 2025

I added some more information and notes on how to review/test to the description.

@korniltsev please take another look and review/test.

@korniltsev
Copy link
Contributor

Great job. Thank you for looking into this.
I like the trick with sched_process_free and that we have no extra timers in userspace and the logic of the PM did not complicate.
I've run both my repro and your repro with libcrypto and the profiler works as expected. It keeps profiling remaining threads including new libraries (libcrypto)
I wish we could somehow create an integration test for it from the repro you've added so that it is run with every testruns instead of hoping I don't forget to run it. But I understand writing a test may be hard / time consuming so we may do this later.
LGTM

@@ -53,9 +58,10 @@ func init() {
}

// New returns an object with Process interface accessing it
func New(pid libpf.PID) Process {
func New(pid, tid libpf.PID) Process {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't switch Process to accept libpf.PIDTID as the latter is only used with PID events, and I'd rather not couple it here too.

Copy link
Contributor

@florianl florianl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the log.Warn(..) messages as mentioned in #376 (comment) before merging.

Copy link
Contributor

@fabled fabled left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick review with few questions. But looks good. Will look in more detail tomorrow.

Comment on lines 305 to 307
// Neither /proc/sp.pid/map_files nor /proc/sp.pid/task/sp.tid/map_files
// exist if main thread has exited, so we use the mapping path directly.
return m.Path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this then be /proc/PID/root/FILE to make sure the file is opened from the right namespace?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/proc/PID/root doesn't exist after main thread exits, I switched the root to /proc/sp.pid/task/sp.tid/root instead.

@@ -229,20 +234,45 @@ func (sp *systemProcess) GetMappings() ([]Mapping, uint32, error) {
defer mapsFile.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this original path of checking the primary maps file be made conditional on if sp.mainThreadExit == false? (Or if this object is not kept between calls, perhaps this could be an argument?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yes, I can add the check/skip-pid-maps logic but I'll also need to add the zombie check to ensure that we're only going to be reading tid-specific maps if it's absolutely the case that main thread has exited.

Copy link
Member Author

@christos68k christos68k Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the zombie check for process exit but adding a conditional check for sp.mainThreadExit == false in GetMappings doesn't currently serve any purpose: GetMappings is only called once for any given systemProcess instance, so the check will never fail (caching mainThreadExit on the instance does serve a purpose however, as there are subsequent method calls after GetMappings that will leverage it).

So we can either have processmanager cache the mainThreadExit status for a particular PID and add this check (and then we'd need extra logic inside GetMappings to determine when the process has finally exited, currently the primary maps file path serves this purpose) or keep the logic as is now.

@christos68k christos68k force-pushed the ck/process-exit branch 2 times, most recently from c01c8b0 to 9f268e1 Compare March 31, 2025 19:26
@christos68k
Copy link
Member Author

I rebased this PR on top of current main.

Copy link
Contributor

@fabled fabled left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks pretty good already. I added few questions and comments. But I'll pre-approve this already so we can go forward.

Comment on lines 243 to 262
// Test for main thread exit by checking for Zombie state
pidStat, err := os.ReadFile(fmt.Sprintf("/proc/%d/stat", sp.pid))
if err != nil {
// Should never happen while process is alive
return nil, 0, err
}

var p int
var c string
var state rune
n, err := fmt.Sscanf(string(pidStat), "%d %s %c", &p, &c, &state)
if err != nil || n < 3 {
// Should never happen
return nil, 0, err
}
sp.fileToMapping = fileToMapping
if state != 'Z' {
return mappings, numParseErrors, ErrNoMappings
}

log.Warnf("PID: %v main thread exit", sp.pid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really needed? I think we can just remove the zombie check.

If the maps file exists, it means process is running.

If the maps file is empty, it means the main thread has exited. There is no other condition that the main maps is empty, because the main thread cannot be executing code if there are no mappings available. This is purely a side effect of kernel having released the main thread specific resources.

Based on the two above things we can determine if: the process exited (since ebpf sent the event), or if the main thread has exited.

Or are you aware of some condition where this makes a difference? I think it was if all mappings entries resulted in parsing error? But I believe this should be handled as an error earlier. The reason is that reading the TID specific maps should be identical to the PID specific as memory mappings are shared between all threads.

Perhaps the only check to do here is if pid == tid then return early with ErrNoMappings.

Copy link
Member Author

@christos68k christos68k Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the zombie check (we'd need it if we go back to walking /proc but we don't need it now).

continue
}
fileToMapping[m.Path] = m
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we should also return early if err is nil and numParseErrors is non-zero. Or perhaps even better, parseMappings could return an err if it failed to find usable mappings (but it managed to read data). The idea is basically to distinguish here if mappings is empty or all lines were non-parseable.

Copy link
Member Author

@christos68k christos68k Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the len(mappings) == 0 check that follows cover this case?

If err == nil and len(mappings) != 0 then we simply continue and process the mappings. If err == nil and len(mappings) == 0 then we continue and try mappings from another thread. Essentially all branching logic depends on err and len(mappings), not numParseErrors which is purely advisory.

Comment on lines 278 to 280
numParseErrorsAlt := uint32(0)
mappings, numParseErrorsAlt, err = parseMappings(mapsFileAlt)
numParseErrors += numParseErrorsAlt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just overwrite the numParseErrors instead of adding them. It is only ever used for counters. And since the per TID and per PID maps should be identical, you are basically reporting doubled errors counter in this case.

sched_process_free is called when the task is freed by the kernel,
which allows for simpler cleanup of processes whose main thread
has exited.
Making TID available to processmanager allows the agent to keep
profiling a process whose main thread calls pthread_exit while
other threads continue to run.
This allows the agent to continue profiling a process whose main
thread has exited, but other threads continue to run. Mapping changes
triggered by one of the remaining threads are also tracked.
The latter is OS-agnostic, but the agent only runs on Linux.
@christos68k christos68k merged commit 81ef3fd into main Apr 25, 2025
25 checks passed
@christos68k christos68k deleted the ck/process-exit branch April 25, 2025 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Profiler incorrectly handles process exit when non-main threads are still running
4 participants