Skip to content

Fix popen leak and handle perf script crash#993

Merged
mlim19 merged 8 commits into
masterfrom
fix_popen_ref_leak
Sep 12, 2025
Merged

Fix popen leak and handle perf script crash#993
mlim19 merged 8 commits into
masterfrom
fix_popen_ref_leak

Conversation

@mlim19

@mlim19 mlim19 commented Aug 26, 2025

Copy link
Copy Markdown
Contributor

The patch is revised to take care of cleaning up the popen object being still referenced after the associated subprocess gets terminated.

Based on code review, 3 issues are identified:

  1. In run_process function, If the assertion does NOT happen, the subprocess is terminated. At this time, I think we should remove the popen object in the global list so that the pipes and file descriptors are cleaned by GC.
  • a. With python 3.2 or later, subprocess.Popen supports the context manager protocol for with statement, which means exit method is implemented and executed when t the with block ends. Popen.exit calls proc.wait() by default.
  • b. At the end of the with block, it looks we ensures the process is terminated by checking the retcode from poll().
    assert retcode is not None # only None if child has not terminated
  1. Some profilers (php, pyperf, perf) call the start_process function directly. So, the above solution for run_process is not helpful. So, we handle the clean up separately here
  2. When "perf script" is crashed, we need to add the handling of exception which should fixes CPU profiling on GPU machines #992

Description

Related Issue

Motivation and Context

How Has This Been Tested?

Screenshots

Checklist:

  • I have read the CONTRIBUTING document.
  • I have updated the relevant documentation.
  • I have added tests for new logic.

Signed-off-by: Min Lim <min.yeol.lim@intel.com>
@mlim19 mlim19 mentioned this pull request Aug 26, 2025
3 tasks
@mlim19 mlim19 requested a review from dkorlovs August 26, 2025 00:21
Comment thread gprofiler/profilers/php.py Outdated
Comment thread gprofiler/profilers/php.py Outdated
Comment thread gprofiler/profilers/php.py
dkorlovs
dkorlovs previously approved these changes Aug 28, 2025

@dkorlovs dkorlovs left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!

mlim19 added 2 commits August 29, 2025 14:24
Signed-off-by: Min Lim <min.yeol.lim@intel.com>
Signed-off-by: Min Lim <min.yeol.lim@intel.com>
@mlim19

mlim19 commented Aug 29, 2025

Copy link
Copy Markdown
Contributor Author

Added a new patch to take care of the cases not handled. After applying the patchset collectively, the overall memory utilization remain same over several hours.
Screenshot 2025-08-28 201708

dkorlovs
dkorlovs previously approved these changes Sep 1, 2025
Signed-off-by: Min Lim <min.yeol.lim@intel.com>
Signed-off-by: Min Lim <min.yeol.lim@intel.com>
@dkorlovs dkorlovs force-pushed the fix_popen_ref_leak branch 4 times, most recently from 6c1246d to 2068415 Compare September 3, 2025 01:31
Signed-off-by: Min Lim <min.yeol.lim@intel.com>
@dkorlovs dkorlovs force-pushed the fix_popen_ref_leak branch 2 times, most recently from 3617aaa to f67374b Compare September 3, 2025 16:05
because it can cause double subprocess communicate calls

Signed-off-by: Min Lim <min.yeol.lim@intel.com>

@prashantbytesyntax prashantbytesyntax left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks good, there is still memory growth which i think needs handling for runtime profilers and perf output parsing optimization but could be done in future PRs

wait_event(self._POLL_TIMEOUT, self._profiler_state.stop_event, lambda: os.path.exists(self.output_path))
except TimeoutError:
process.kill()
cleanup_process_reference(process)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably need clean up on run time profilers too

@mlim19 mlim19 merged commit 1523207 into master Sep 12, 2025
35 checks passed
@mlim19 mlim19 deleted the fix_popen_ref_leak branch September 12, 2025 15:47
@mlim19 mlim19 mentioned this pull request Oct 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CPU profiling on GPU machines

3 participants