Skip to content

Conversation

@everzakov
Copy link

Currently there is an golang issue golang/go#76746 .
If the cmd.Start is called twice (the first call was unsuccessful) then
Cmd.Wait can return an error (because goroutines' pipes will be closed in the first try).
However, the second cmd.Start can return nil.
After https://go-review.googlesource.com/c/go/+/728642 is merged, then
the second call will return an error. So we need to recreate the Cmd object.
We can not simply copy it
https://go-review.googlesource.com/c/go/+/728642/comments/a837fe92_a5fc5f06 .

Fixes #5060

@kolyshkin
Copy link
Contributor

Hmm, the code to create a copy of the cmd is kinda fragile, and I guess what Alan suggested is a better way, i.e.

use some other structure to hold all the information needed to populate a Cmd, and create the Cmd just before you call Start.

One more alternative is to check whether CLONE_INTO_CGROUP is available, but I'm afraid the check is going to be too expensive, and caching its result is not very reliable either.

@everzakov everzakov force-pushed the 5060-cmd-start-fix branch 2 times, most recently from 070b6ed to 4cdc0d6 Compare December 16, 2025 13:14
@everzakov
Copy link
Author

everzakov commented Dec 16, 2025

I guess what Alan suggested is a better way

Hello! Something like this 4cdc0d6 ? I have made this as a Container method.

@lifubang
Copy link
Member

lifubang commented Dec 18, 2025

@everzakov Have you verified whether this issue actually affects runc? If so, why aren’t we seeing any errors on almalinux-8?

In fact, runc doesn’t call cmd.Wait() directly—it uses cmd.Process.Wait() instead to wait for the child process.

Regarding the Go PR (728642), I think the change might be overly strict for users and could potentially be a breaking change.

@lifubang
Copy link
Member

In fact, I changed the CgroupFd to a random num in runc code, and I can't see any errors for runc, for example:

p.cmd.SysProcAttr.CgroupFD = 100 //int(fd.Fd())
lifubang@acmcoder /opt/ubuntu $ sudo ./runc exec test echo hello
hello
lifubang@acmcoder /opt/ubuntu $ sudo ./runc --debug exec test echo hello
...
DEBU[0000]libcontainer/process_linux.go:359 libcontainer.(*setnsProcess).prepareCgroupFD() using CLONE_INTO_CGROUP "/sys/fs/cgroup/user.slice/user-1000.slice/test" 
... libcontainer.(*setnsProcess).startWithCgroupFD() exec with CLONE_INTO_CGROUP failed: fork/exec /proc/self/fd/6: bad file descriptor; retrying without** 
...                       
hello

@everzakov
Copy link
Author

everzakov commented Dec 18, 2025

In fact, runc doesn’t call cmd.Wait() directly—it uses cmd.Process.Wait() instead to wait for the child process.

Actually there are some calls cmd.Wait() - https://github.com/opencontainers/runc/blob/v1.4.0/libcontainer/process_linux.go#L152 and https://github.com/opencontainers/runc/blob/v1.4.0/libcontainer/process_linux.go#L524 .

I can't see any errors for runc

Yeah, the main problem to find this fault in runtime is that all cmd.Wait calls' errors are ignored. Also, it is often preceded by process KILL. So, this error can be found only in tests. You can try to run TestEnter test with your change. The error should be the same as from issue.

Regarding the Go PR (728642), I think the change might be overly strict for users and could potentially be a breaking change.

I have proposed another go pr 728660 . In my PR it will be possible to call the same os/exec Cmd object if the first try was unsuccessful. Also, Cmd.Wait does not return the error. However, i think 728642 will be merged.

@lifubang
Copy link
Member

You're right — we shouldn’t reuse the exec.Command object to restart the process.

Just a discuss, a simple fix would be to move the retry logic into container_linux.go? Maybe in here:

	if err := parent.start(); err != nil {
		return fmt.Errorf("unable to start container process: %w", err)
	}

@lifubang
Copy link
Member

Additionally, adding a test case to ensure the patch works correctly would be appreciated.

@everzakov
Copy link
Author

Additionally, adding a test case to ensure the patch works correctly would be appreciated.

Hello! Well, i'm not very familiar with cgroups library. I think the test should be where we pass the non cgroup path and the check fails in the kernel. However, this path should exist . There are a lot funcs to clean and check prefixes . In Create we pass nil paths in cgroup manager so the default /sys/fs/cgroup pass will be used. I think it will be impossible to create non cgroup child directory in /sys/fs/cgroup tree. But we can use factory Load method to get a container . In this method cgroup paths are passed that's why we can set our non cgroup path.

So, the idea is to patch the dumped container state, load it with changed cgroup paths and turn off the cgroup function to check directory cgroup magic. The first call will fail because we will pass non cgroup dir. The second call doesn't return an error, and the set ns pid will be added successfuly into the "cgroup".
I think it can be very syntetic test but w/o changes it will fail in the new kernels :/

The test logs:
time="2025-12-22T16:28:50Z" level=debug msg="using CLONE_INTO_CGROUP "/tmp/TestCmdRetry2213481631/001""
time="2025-12-22T16:28:50Z" level=debug msg="exec with CLONE_INTO_CGROUP failed: fork/exec /proc/self/fd/6: bad file descriptor; retrying without"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

runc 1.4 breaks the rule that cmd.Start is called only once that's why process Wait can return an error.

3 participants