do not sleep to kill a process that has already cleaned up and exited by itself #672

istyf · 2024-11-01T23:51:16Z

This PR fixes #671 by replacing the time.Sleep in killCmd (for linux and unix/mac) with a cancelable context that can be used to stop the kill timer and return to the caller as soon as the interrupted process has closed down.

While developing this PR I took the liberty of deleting the unit test for testing killing of "non existing" processes for reasons outlined in 52428bf.

This PR has been tested on wsl2 linux and an Apple M2 MBP.

if the process to be killed honours the SIGINT and closes down, we should not sleep before trying to kill a process that is no longer there

… entire group

the test paniced if it ever got a nil err since t.Failf does not imply that test is stopped selecting a random pid to kill might have interesting side effects when run and its result can never be guaranteed to be the same, meaning that the test had intermittent failures no code depends on the specific return code of killCmd, so there does not seem to be a reason to have a unit test for it

istyf · 2025-11-10T15:23:36Z

@xiantang What is the status/probability of accepting this PR?

xiantang · 2025-11-10T15:29:21Z

could u rebase to master?

xiantang · 2025-11-10T15:31:08Z

Because I'm doing #809, I think there may be conflicts.

xiantang · 2025-11-10T15:38:09Z

I need to merge #809 to master first, because once I remove the pty, Linux behavior will be broken, and it cannot clean child processes correctly.

Copilot

Pull Request Overview

This PR refactors the killCmd function in Unix and Linux implementations to use goroutines and channels for asynchronous process termination, replacing a blocking approach with concurrent handling of process waiting and killing with configurable delays.

Key changes:

Introduces goroutine-based concurrent handling of process wait and kill operations
Adds configurable kill delay support when SendInterrupt is enabled
Removes Test_killCmd_no_process test case

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
runner/util_unix.go	Refactored killCmd to use goroutines for concurrent wait/kill with delay support
runner/util_linux.go	Identical refactoring of killCmd for Linux platform
runner/util_test.go	Removed Test_killCmd_no_process test and unused imports

Comments suppressed due to low confidence (2)

runner/util_unix.go:1

The kill goroutine can leak when killDelay is 0 and the process exits quickly. When killDelay is 0, time.After(0) fires immediately, but if the process has already exited and waitResult is read first in the main loop (line 64), the context is canceled via defer at line 40. However, the kill goroutine may have already sent to killResult (line 48) before reading ctx.Done(), causing it to block forever if nothing reads from killResult. Consider closing the killResult channel with defer close(killResult) or making it buffered: killResult := make(chan error, 1).

//go:build unix && !linux

runner/util_linux.go:1

The kill goroutine can leak when killDelay is 0 and the process exits quickly. When killDelay is 0, time.After(0) fires immediately, but if the process has already exited and waitResult is read first in the main loop (line 63), the context is canceled via defer at line 39. However, the kill goroutine may have already sent to killResult (line 47) before reading ctx.Done(), causing it to block forever if nothing reads from killResult. Consider closing the killResult channel with defer close(killResult) or making it buffered: killResult := make(chan error, 1).

package runner

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

runner/util_unix.go

Copilot · 2025-11-10T15:41:48Z

runner/util_unix.go

+	for {
+		// collect the responses from the kill and wait goroutines
+		select {
+		case err = <-killResult:
+			results = append(results, err)
+		case err = <-waitResult:
+			results = append(results, err)
+			// if we have a kill delay, we ignore the kill result
+			if killDelay > 0 && len(results) == 1 {
+				results = append(results, nil)
+			}
+		}
+
+		if len(results) == 2 {
+			err = errors.Join(results...)
+			return
+		}
 	}


When killDelay is 0, the kill goroutine sends immediately to killResult and the select statement may non-deterministically choose between reading from killResult or waitResult. If killResult is read first (line 62), the code appends the error and continues looping. Then when waitResult is read (line 64), the logic at lines 67-69 will not execute because killDelay is 0, leaving len(results) at 1 and causing an infinite loop. The condition at line 67 should be if killDelay == 0 && len(results) == 1 to handle the case when there is no kill delay.

Same incorrect assumption about an error here as in util_linux and the proposed "fix" does not work.

runner/util_linux.go

Copilot · 2025-11-10T15:41:49Z

runner/util_linux.go

+	for {
+		// collect the responses from the kill and wait goroutines
+		select {
+		case err = <-killResult:
+			results = append(results, err)
+		case err = <-waitResult:
+			results = append(results, err)
+			// if we have a kill delay, we ignore the kill result
+			if killDelay > 0 && len(results) == 1 {
+				results = append(results, nil)
+			}
+		}
+
+		if len(results) == 2 {
+			err = errors.Join(results...)
+			return
+		}
+	}


When killDelay is 0, the kill goroutine sends immediately to killResult and the select statement may non-deterministically choose between reading from killResult or waitResult. If killResult is read first (line 61), the code appends the error and continues looping. Then when waitResult is read (line 63), the logic at lines 66-68 will not execute because killDelay is 0, leaving len(results) at 1 and causing an infinite loop. The condition at line 66 should be if killDelay == 0 && len(results) == 1 to handle the case when there is no kill delay.

When killDelay is 0, the kill goroutine sends immediately to killResult and the select statement may non-deterministically choose between reading from killResult or waitResult. If killResult is read first (line 61), the code appends the error and continues looping. Then when waitResult is read (line 63), the logic at lines 66-68 will not execute because killDelay is 0, leaving len(results) at 1 and causing an infinite loop. The condition at line 66 should be if killDelay == 0 && len(results) == 1 to handle the case when there is no kill delay.

@istyf did u test this case?
this pr totally ok, but for this code block i dont think the impl is neat

Are you basing that on Copilots incorrect assumption that there is a bug here?

"Then when waitResult is read (line 63), the logic at lines 66-68 will not execute because killDelay is 0, leaving len(results) at 1 and causing an infinite loop." is wrong. What does it think happens at line 64? We add the wait result (which is always nil by design) to the results slice, causing an exit on line 73.

istyf · 2025-11-10T15:48:41Z

I need to merge #809 to master first, because once I remove the pty, Linux behavior will be broken, and it cannot clean child processes correctly.

I do not understand. Why would you merge something that breaks Linux behaviour?

xiantang · 2025-11-10T16:18:37Z

Oh, I see, you just changed the killcmd code, it should be okay. because for linux is still using pty to start a new process

runner/util_linux.go

codecov · 2025-11-10T19:52:43Z

Codecov Report

❌ Patch coverage is 91.48936% with 4 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
runner/util_linux.go	91.30%	1 Missing and 1 partial ⚠️
runner/util_unix.go	91.66%	1 Missing and 1 partial ⚠️

Files with missing lines	Coverage Δ
runner/util_linux.go	`67.00% <91.30%> (-1.00%)`	⬇️
runner/util_unix.go	`80.95% <91.66%> (+12.95%)`	⬆️

... and 6 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

istyf

.

xiantang · 2025-11-12T06:40:52Z

Hello @istyf, could you help resolve the conflict?

istyf · 2025-11-12T14:29:12Z

Upstream merged into PR and conflict resolved. However this has now not been tested on my side.

xiantang · 2025-11-12T14:59:32Z

Okay, please test this feature again as well, thanks.

…rly exit optimization) Add three comprehensive regression tests to verify the fix in commit 4d26204: 1. Test_killCmd_SendInterrupt_FastGracefulExit - Verifies processes that exit quickly on SIGINT return immediately - Saves ~2s when process exits in <1ms vs 2s kill_delay 2. Test_killCmd_SendInterrupt_IgnoresSIGINT - Verifies processes ignoring SIGINT still get SIGKILL after kill_delay - Ensures optimization doesn't break fallback behavior 3. Test_killCmd_SendInterrupt_SlowGracefulExit - Verifies processes that take time to cleanup still benefit - Saves ~700ms when process exits in 300ms vs 1s kill_delay These tests ensure the goroutine-based optimization continues to work correctly and prevent future regressions. Related: air-verse#671

xiantang · 2026-01-05T15:13:08Z

runner/util_unix.go

-			return
-		}
-		time.Sleep(e.config.killDelay())
+	// Start a goroutine to wait for the process to exit


@istyf hello, i think this implement is more neat.

istyf added 5 commits November 1, 2024 16:39

abort kill delay if process closes after interrupt

4d26204

if the process to be killed honours the SIGINT and closes down, we should not sleep before trying to kill a process that is no longer there

make sure to wait for both wait and kill results when kill delay is 0

ffd34c9

use cmd.process.signal and cmd.process.kill instead of raw syscalls

45409a0

switch back to syscall as os.process.signal or kill do not signal the…

4d976a7

… entire group

xiantang requested a review from Copilot November 10, 2025 15:39

Copilot AI reviewed Nov 10, 2025

View reviewed changes

Merge branch 'master' into fix/only-kill-after-interrupt-when-required

6c055fe

xiantang reviewed Nov 10, 2025

View reviewed changes

runner/util_linux.go Outdated Show resolved Hide resolved

add clarifying comments to the goroutine response collection code

c8fbfcc

istyf commented Nov 11, 2025

View reviewed changes

merged upstream/master into feature branch

63889e4

xiantang force-pushed the fix/only-kill-after-interrupt-when-required branch from 5885492 to 0da8425 Compare January 5, 2026 15:09

xiantang reviewed Jan 5, 2026

View reviewed changes

xiantang merged commit 6f43af7 into air-verse:master Jan 5, 2026
7 checks passed

Uh oh!

do not sleep to kill a process that has already cleaned up and exited by itself #672

do not sleep to kill a process that has already cleaned up and exited by itself #672

Uh oh!

Conversation

istyf commented Nov 1, 2024

Uh oh!

istyf commented Nov 10, 2025

Uh oh!

xiantang commented Nov 10, 2025

Uh oh!

xiantang commented Nov 10, 2025

Uh oh!

xiantang commented Nov 10, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

istyf Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

xiantang Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

istyf Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

istyf commented Nov 10, 2025

Uh oh!

xiantang commented Nov 10, 2025

Uh oh!

Uh oh!

codecov bot commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

istyf left a comment

Choose a reason for hiding this comment

Uh oh!

xiantang commented Nov 12, 2025

Uh oh!

istyf commented Nov 12, 2025

Uh oh!

xiantang commented Nov 12, 2025

Uh oh!

xiantang Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Nov 10, 2025 •

edited

Loading