Skip to content

Race Condition in StartExec #1076

@mpass99

Description

@mpass99

Expected Behavior

We consistently receive no error for an exec in a container (thread-safe).

Current Behavior

Sometimes the StartExec function is returning the error io: read/write on closed pipe.

This happens once with around:

  • 30.000 exec calls in production
  • 60.000 exec calls running the following minimal reproduction setup in a local environment
  • 500 exec call when enabling Go's Data Race Detector (go run -race main)

Steps to Reproduce

  • docker pull ubuntu
  • go mod init main
  • go get github.com/fsouza/go-dockerclient
  • Run main.go with go run -race main.go.

Possible Solution

The error originates in the forwarding of the StdIn stream, here.

One Goroutine (1) is started to copy the StdIn to the Docker Deamon.

go-dockerclient/client.go

Lines 826 to 835 in 87c2a33

go func() {
var err error
if hijackOptions.in != nil {
_, err = io.Copy(rwc, hijackOptions.in)
}
errChanIn <- err
rwc.(interface {
CloseWrite() error
}).CloseWrite()
}()

Another Goroutine (2) copies the StdOut from the Daemon. However, when the copy of the StdOut stream finishes, the Input Reader is closed.

go-dockerclient/client.go

Lines 806 to 823 in 87c2a33

go func() {
defer func() {
if hijackOptions.in != nil {
if closer, ok := hijackOptions.in.(io.Closer); ok {
closer.Close()
}
errChanIn <- nil
}
}()
var err error
if hijackOptions.setRawTerminal {
_, err = io.Copy(hijackOptions.stdout, br)
} else {
_, err = stdcopy.StdCopy(hijackOptions.stdout, hijackOptions.stderr, br)
}
errChanOut <- err
}()

Common Readers (such as io.Pipe) return an error when someone tries to read from it after it has been closed. This happens here when Goroutine (2) closes the Reader and Goroutine (1) is still trying to copy data from it.

When the Go scheduler now decides to switch Goroutines between these two lines

go-dockerclient/client.go

Lines 810 to 812 in 87c2a33

closer.Close()
}
errChanIn <- nil

to these two lines

go-dockerclient/client.go

Lines 829 to 831 in 87c2a33

_, err = io.Copy(rwc, hijackOptions.in)
}
errChanIn <- err

an unexpected error is being returned.

Context (Environment)

This scenario is relevant as the Nomad orchestrator triggers it regularly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions