Fix Windows terminal Ctrl-C process cleanup#3171
Conversation
Terminate descendant processes when interrupting or closing the Windows PowerShell backend so timed-out child process trees do not survive Ctrl-C/reset. Co-authored-by: openhands <openhands@all-hands.dev>
98881c1 to
2082074
Compare
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟢 Good taste - Clean, targeted fix for a real Windows process cleanup bug.
Code Quality Assessment
This is a well-implemented fix for a legitimate bug where Windows PowerShell child processes aren't cleaned up after timeouts. The implementation is excellent:
- PowerShell script logic: Correctly enumerates process tree using
Win32_Process, builds parent-child map, recursively collects descendants, and terminates in reverse order (leaves first) - Safety: Excludes the script's own process (
$PID), uses-ErrorAction SilentlyContinue, has proper timeout handling - Interrupt flow: Sensible fallback logic (try CTRL_BREAK → wait → terminate children → Ctrl-C input as last resort)
- Testing: Good regression test that verifies real subprocess tree cleanup (not mocks)
- Error handling: Appropriate try/except and return value semantics
Eval Risk Flag
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟡 MEDIUM
Per repository guidelines, terminal/stdin/stdout handling changes require eval validation before approval. While the actual risk is low (Windows-specific bug fix, most evals run on Linux, makes timeout behavior work as intended), it falls under the eval-review category.
Recommendation: Maintainer should confirm whether Windows eval validation is needed, or approve if Windows is not a critical eval platform. The code itself is production-ready.
VERDICT:
✅ Code quality: Excellent
⏸️ Merge decision: Deferred to maintainer for eval policy decision on Windows terminal changes
Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:
- Add a
.agents/skills/custom-codereview-guide.mdfile to your branch (or edit it if one already exists) with the/codereviewtrigger and the context the reviewer is missing (e.g., "Security concerns about X do not apply here because Y"). See the customization docs for the required frontmatter format.- Re-request a review - the reviewer reads guidelines from the PR branch, so your changes take effect immediately.
- When your PR is merged, the guideline file goes through normal code review by repository maintainers.
Resolve with AI? Install the iterate skill in your agent and run
/iterateto automatically drive this PR through CI, review, and QA until it's merge-ready.
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS WITH LIMITATIONS
Windows-specific functionality verified through CI; unable to test directly on Linux.
Does this PR achieve its stated goal?
Yes. The PR successfully fixes Windows terminal Ctrl-C process cleanup. CI evidence shows the new regression test test_windows_ctrl_c_interrupt_kills_child_process_tree passed on Windows, verifying that child processes are now properly terminated when Ctrl-C is sent after a timeout. The test creates a child PowerShell process that sleeps for 120 seconds, triggers a timeout, sends Ctrl-C, and confirms the child process is stopped—exactly the behavior the PR promises to deliver.
| Phase | Result |
|---|---|
| Environment Setup | ✅ Dependencies synced successfully |
| CI Status | ✅ All checks passing (windows-tests: PASS, tools-tests: PASS) |
| Functional Verification |
Functional Verification
Test 1: Windows Ctrl-C Regression Test (via CI)
Platform constraint:
QA environment is Linux. The PR changes are Windows-specific (PowerShell terminal backend). The test is correctly marked with:
pytestmark = pytest.mark.skipif(
platform.system() != "Windows",
reason="Windows CTRL_BREAK/PowerShell process behavior only applies on Windows",
)Verification approach:
Inspected CI logs for the windows-tests workflow run.
Evidence from CI logs:
windows-tests Run Windows test suite 2026-05-09T03:26:07.1850999Z
tests/tools/terminal/test_windows_ctrl_c.py::test_windows_ctrl_c_interrupt_kills_child_process_tree PASSED
Interpretation:
The new test passed on Windows. This test:
- Creates a PowerShell script that spawns a child process (
Start-Sleep -Seconds 120) - Executes the script and waits for a timeout
- Sends Ctrl-C via
session.execute(TerminalAction(command="C-c", is_input=True)) - Asserts the child process was terminated:
assert not child_was_still_running
The test passing proves the fix works: child processes are now properly cleaned up.
Test 2: Linux Test Skip Behavior
Ran on Linux:
uv run pytest tests/tools/terminal/test_windows_ctrl_c.py -vOutput:
tests/tools/terminal/test_windows_ctrl_c.py::test_windows_ctrl_c_interrupt_kills_child_process_tree SKIPPED (Windows CTRL_BREAK/PowerShell process behavior only applies on Windows)
Interpretation:
Test is correctly skipped on non-Windows platforms. No false failures.
Test 3: No Regressions in send_keys Tests
Ran:
uv run pytest tests/tools/terminal/test_send_keys.py -vOutput:
23 passed, 5 warnings in 7.83s
Interpretation:
All send_keys tests passed. The changes to interrupt() and addition of _terminate_child_processes() did not break existing Ctrl-C handling or special key functionality.
Test 4: Code Quality Checks
Ran:
uv run pre-commit run --files openhands-tools/openhands/tools/terminal/terminal/windows_terminal.py tests/tools/terminal/test_windows_ctrl_c.pyOutput:
Ruff format..............................................................Passed
Ruff lint................................................................Passed
PEP8 style check (pycodestyle)...........................................Passed
Type check with pyright..................................................Passed
Check import dependency rules............................................Passed
Check Tool subclass registration.........................................Passed
Interpretation:
All style, lint, and type checks passed. Code adheres to project standards.
Unable to Verify
Windows-Specific Behavior on Linux
What could not be verified:
- Direct execution of the Windows terminal backend on Linux
- Manual testing of the PowerShell child process termination logic
- Interactive verification of Ctrl-C behavior with real PowerShell processes
What was attempted:
Attempted to run test_windows_ctrl_c_interrupt_kills_child_process_tree on Linux, but it was correctly skipped due to platform restrictions.
Why verification failed:
The QA environment runs on Linux. The PR's changes are specific to Windows PowerShell terminal behavior and require:
- Windows OS
- PowerShell (
powershell.exe) - Windows process management APIs (
CTRL_BREAK_EVENT,Win32_Process)
Suggested AGENTS.md guidance:
For future QA of Windows-specific terminal features:
- Rely on CI
windows-testsworkflow results as the primary verification - Ensure the Windows CI runner is configured with PowerShell
- For local verification, use a Windows development environment or Windows Subsystem for Linux (WSL) with PowerShell installed
- Consider adding a smoke test that can run on Linux but validates the code structure (e.g., ensuring
_terminate_child_processes()returnsFalseon non-Windows platforms)
Evidence relied upon instead:
CI logs from the windows-tests workflow run confirming the new test passed.
Issues Found
None.
Verdict: This PR successfully delivers on its stated goal. The Windows CI runner verified the fix works as intended, and all code quality checks passed. The inability to test directly on Linux is a platform limitation, not a deficiency in the PR.
|
@OpenHands fix the code review comment |
|
I'm on it! neubig can track my progress at all-hands.dev |
Co-authored-by: openhands <openhands@all-hands.dev>
|
OpenHands encountered an error: Request timeout after 30 seconds to https://xyftbatlgupjregx.prod-runtime.all-hands.dev/api/conversations/6d45534e-99d8-4bf9-a7f6-934567fb7635/ask_agent See the conversation for more information. |
|
@OpenHands the windows CI on this PR seems to be failing, perhaps due to a timeout. Double-check current and previous CI runs for windows-tests, diagnose the problem, and /iterate until CI passes and the PR is ready to merge. |
|
I'm on it! neubig can track my progress at all-hands.dev |
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
|
Update on the Windows CI timeout investigation:
This comment was created by an AI agent (OpenHands) on behalf of the user. |
|
OpenHands encountered an error: Request timeout after 30 seconds to https://nbsmvawoiyeenyvd.prod-runtime.all-hands.dev/api/conversations/59d3dd67-977a-496e-97ed-40539bba9013/ask_agent See the conversation for more information. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟢 Good taste - Clean, targeted fix for a real Windows process cleanup bug.
The implementation is solid and well-tested. Flagging for eval verification before merge.
[EVAL-RISK FLAG]
This PR modifies terminal interrupt behavior (how commands are stopped after timeouts), which falls into the eval-risk category per repo guidelines. While the fix is correct and necessary, I recommend running lightweight evals (SWE-bench lite or similar) to verify no regression in agent behavior during command timeouts.
The risk: previously timed-out commands left child processes running, which could affect subsequent operations. With proper cleanup, timing and success rates might change.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟡 MEDIUM
Changes core terminal interrupt behavior on Windows. Implementation is sound and addresses a real bug (child processes not cleaned up after timeout/interrupt). Risk is medium because: (1) affects command execution flow, (2) could impact benchmark timing, (3) new process termination logic enumerates all system processes (minor performance consideration). Good test coverage reduces risk.
VERDICT:
✅ Worth merging after eval verification
KEY INSIGHT:
Fixes Windows' lack of process group signals by explicitly walking and terminating the descendant process tree - necessary but adds system-wide process enumeration during interrupts.
| Stop-Process -Id $toStop[$i] -Force -ErrorAction SilentlyContinue | ||
| }} | ||
| if ($toStop.Count -gt 0) {{ exit 0 }} else {{ exit 1 }} | ||
| """ |
There was a problem hiding this comment.
🟡 Suggestion: Get-CimInstance Win32_Process enumerates all processes on the system. On machines with thousands of processes this could take 1-2+ seconds. Consider documenting expected latency in the method docstring, or adding a timeout check if this becomes a production issue.
Not blocking - this only runs during interrupt/close, so the overhead is acceptable for fixing the cleanup bug.
|
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
Windows Ctrl-C process cleanup verified through Windows CI test execution; Linux environment constraint prevents direct testing.
Does this PR achieve its stated goal?
Yes. The PR successfully fixes Windows terminal Ctrl-C process cleanup by terminating descendant processes when interrupts are sent or sessions are closed. The regression test test_windows_ctrl_c_interrupt_kills_child_process_tree PASSED on the Windows CI runner, proving that child processes spawned during command execution are now properly terminated after Ctrl-C, which was the core issue this PR set out to solve. The test creates a realistic scenario: a timed-out PowerShell script that spawns a sleeping child process, sends Ctrl-C, and verifies the child is stopped—exactly the behavior promised in the PR description.
| Phase | Result |
|---|---|
| Environment Setup | ✅ Dependencies synced, Python 3.13.13, uv-managed monorepo |
| CI Status | ✅ All checks passing: windows-tests PASS, tools-tests PASS, pre-commit PASS |
| Functional Verification | ✅ Windows test passed in CI; Linux correctly skipped; implementation verified |
Functional Verification
Verification 1: Windows-Specific Test Execution via CI
Platform constraint:
QA runs on Linux (uname -a → Linux runnervmeorf1 6.17.0-1010-azure). This PR modifies Windows PowerShell terminal backend behavior that requires:
- Windows OS
powershell.exe- Windows process management APIs (
CTRL_BREAK_EVENT,Win32_Process,Get-CimInstance)
Approach:
Inspect CI logs from the windows-tests workflow run to verify the new regression test executed and passed on actual Windows.
Step 1 — Confirm test is Windows-only:
Ran locally on Linux:
uv run pytest tests/tools/terminal/test_windows_ctrl_c.py -vOutput:
tests/tools/terminal/test_windows_ctrl_c.py::test_windows_ctrl_c_interrupt_kills_child_process_tree SKIPPED (Windows CTRL_BREAK/PowerShell process behavior only applies on Windows)
Interpretation: Test correctly skips on Linux via pytest.mark.skipif(platform.system() != "Windows"). No false positives on wrong platforms.
Step 2 — Verify test passed on Windows CI:
Queried GitHub PR checks:
gh pr view 3171 --json statusCheckRollup --jq '.statusCheckRollup[] | select(.name == "windows-tests")'Result:
{"conclusion": "SUCCESS", "name": "windows-tests"}Fetched Windows CI logs:
gh run view 25633045051 --log | grep test_windows_ctrl_c_interrupt_kills_child_process_treeOutput:
windows-tests 2026-05-10T15:53:14.7161408Z tests/tools/terminal/test_windows_ctrl_c.py::test_windows_ctrl_c_interrupt_kills_child_process_tree PASSED
Interpretation: The new regression test ran on Windows and passed. This test:
- Creates a PowerShell script that spawns a child process (
Start-Process ... Start-Sleep -Seconds 120) - Executes the script with a 1-second timeout, triggering
NO_CHANGE_TIMEOUT - Sends Ctrl-C via
session.execute(TerminalAction(command="C-c", is_input=True)) - Verifies the child process PID no longer exists:
assert not _powershell_process_exists(child_pid)
The passing assertion proves the fix works: the new _terminate_child_processes() method successfully stops descendant processes.
Verification 2: Implementation Review
Examined code changes:
git diff 3166003e..0090c244 --statResult:
openhands-tools/openhands/tools/terminal/terminal/windows_terminal.py | 68 ++++++++++++++++++++
tests/tools/terminal/test_windows_ctrl_c.py | 106 ++++++++++++++++++++++++++++++
tests/tools/browser_use/test_browser_toolset.py | 3 +-
3 files changed, 174 insertions(+), 3 deletions(-)
Key implementation points verified:
-
New
_terminate_child_processes()method (lines 332-386):- Queries all Windows processes via
Get-CimInstance Win32_Process - Builds parent→children mapping
- Recursively collects descendants of the PowerShell session PID
- Terminates in reverse order (leaves first) via
Stop-Process -Force - Returns
Trueif any processes were stopped,Falseotherwise - Properly guarded: returns
Falseimmediately if not on Windows or process is dead
- Queries all Windows processes via
-
Integration points:
- Called in
close()(line 176): Cleans up descendants when session terminates - Called in
interrupt()(line 405): Cleans up descendants after Ctrl-C
- Called in
-
Interrupt flow logic (lines 388-417):
- First tries
CTRL_BREAK_EVENTsignal (Windows-specific) - Then waits 0.5 seconds (
_INTERRUPT_GRACE_SECONDS) - Then calls
_terminate_child_processes() - Falls back to writing
C-cto stdin only if prior methods didn't work - Returns
Trueif any method succeeded
- First tries
Interpretation: Implementation is sound. The PowerShell script correctly enumerates the process tree (using CIM instead of unreliable Get-Process -IncludeUserName), excludes the script's own PID, and terminates descendants safely with error suppression. The integration into both close() and interrupt() ensures cleanup happens in both graceful shutdown and forced interrupt scenarios.
Verification 3: No Regressions in Related Tests
Checked test file mentioned in PR description:
uv run pytest tests/tools/terminal/test_send_keys.py -qOutput:
23 passed in 7.83s
Interpretation: All send_keys tests passed. Changes to interrupt() method did not break existing terminal input handling or special key sequences.
Verification 4: Code Quality
Ran linting and type checks:
uv run pre-commit run --files openhands-tools/openhands/tools/terminal/terminal/windows_terminal.py tests/tools/terminal/test_windows_ctrl_c.pyResult: All checks passed (format, lint, type check, import rules).
Confirmed method is called in correct locations:
grep -n "_terminate_child_processes" openhands-tools/openhands/tools/terminal/terminal/windows_terminal.pyOutput:
176: self._terminate_child_processes() # in close()
332: def _terminate_child_processes(self) -> bool: # method definition
405: terminated_children = self._terminate_child_processes() # in interrupt()
Interpretation: Code adheres to project standards. Method is properly integrated into both cleanup paths.
Unable to Verify Directly
Direct Windows Execution
What could not be verified:
- Interactive PowerShell session with real child process spawning
- Manual verification of
Win32_Processenumeration on Windows - Performance impact of full system process enumeration during interrupt
Why verification was not possible:
QA environment is Linux. Windows-specific APIs (subprocess.CTRL_BREAK_EVENT, subprocess.STARTUPINFO, Get-CimInstance Win32_Process) are unavailable.
What was relied upon instead:
CI logs from the windows-tests workflow run (job ID 75240052629) showing the regression test passed on a real Windows runner (windows-latest).
Suggested AGENTS.md guidance for future Windows terminal QA:
- Primary verification: Rely on
windows-testsCI workflow results - For local testing, use a Windows development environment or Windows VM
- Consider adding a smoke test that validates graceful degradation on non-Windows platforms (e.g., confirming
_terminate_child_processes()returnsFalseon Linux without crashing) - For performance validation, profile system-wide process enumeration impact on Windows machines with many running processes
Issues Found
None. The PR delivers exactly what it promises: Windows child process cleanup on Ctrl-C and session close. CI validation confirms functionality works on Windows.
Conclusion: This PR successfully achieves its stated goal of fixing Windows terminal Ctrl-C process cleanup. The Windows CI runner executed and passed the regression test, proving descendant processes are now properly terminated. All related tests passed without regression. The implementation is clean, well-structured, and properly integrated into both interrupt and close code paths.
Summary
C-cTests
uv run pytest tests/tools/terminal/test_windows_ctrl_c.py tests/tools/terminal/test_send_keys.py -quv run pre-commit run --files openhands-tools/openhands/tools/terminal/terminal/windows_terminal.py tests/tools/terminal/test_windows_ctrl_c.pyThis pull request was created by an AI agent (OpenHands) on behalf of the user.
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:0090c24-pythonRun
All tags pushed for this build
About Multi-Architecture Support
0090c24-python) is a multi-arch manifest supporting both amd64 and arm640090c24-python-amd64) are also available if needed