Skip to content

Buildkit on Windows Server 2019 Unable to Reuse Namespaces / Endpoints #5668

Open
@nerddtvg

Description

@nerddtvg

When running on Server 2019, buildkit is unable to reuse a namespace for the second RUN command in a Dockerfile (or subsequent runs). The namespace state is 1 and endpoint state is 4 (Detatched).

An identical buildkit, containerd, cni configuration in Server 2022 works without issue.

Server 2019 1809
Microsoft Windows NT 10.0.17763.0

buildkitd --version
buildkitd github.com/moby/buildkit v0.18.2 e4da654b1251f91e914fab18eba33743aefd7080

containerd --version
containerd github.com/containerd/containerd/v2 v2.0.1 88aa2f531d6c2922003cc7929e51daf1c14caa0a

nerdctl --version
nerdctl version 2.0.2

I have tried the Windows CNI plugin versions 0.3.0 and 0.3.1.

This is a similar error to: #4960 However, the subnet and NAT configurations are correct and match the existing nat HnsNetwork.

Sample Dockerfile:

FROM mcr.microsoft.com/dotnet/runtime:8.0-windowsservercore-ltsc2019

SHELL ["C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell", "-NonInteractive", "-NoProfile", "-Command", "$ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue';"]

RUN ipconfig /all
RUN nslookup google.com
RUN ping google.com

ENTRYPOINT ["C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell", "-Command", "$sec = Get-Random -Minimum 10 -Maximum 100 ; Write-Host $sec ; Start-Sleep -Seconds $sec"]

buildkitd logs:

time="2025-01-16T15:36:59Z" level=debug msg="load cache for [1/4] FROM mcr.microsoft.com/dotnet/runtime:8.0-windowsservercore-ltsc2019@sha256:feca15ab0fa7f47610aabfd72f9d87c9926b520b67c824003e9d44aaf2095424 with amcuh3zdpvou93q7u656cj3da::9y7eql5jiyjat2ld2ci8voerl"
time="2025-01-16T15:36:59Z" level=debug msg="Calling proc (1)"
time="2025-01-16T15:36:59Z" level=debug msg="Calling proc (2)"
time="2025-01-16T15:37:00Z" level=debug msg="creating new network namespace rrinwqxoq8xx5xyw2ehllg2dl" span="[2/4] RUN ipconfig /all" spanID=5b1a16f33e8cbd78 traceID=f9b970ce5f01f392bee8dca451ba00c4
time="2025-01-16T15:37:00Z" level=debug msg="hcn::HostComputeNamespace::Create id="
time="2025-01-16T15:37:00Z" level=debug msg="hcn::HostComputeNamespace::Create JSON: {\"Type\":\"Guest\",\"SchemaVersion\":{\"Major\":2,\"Minor\":0}}"

time="2025-01-16T15:37:00Z" level=debug msg="finished creating network namespace rrinwqxoq8xx5xyw2ehllg2dl" span="[2/4] RUN ipconfig /all" spanID=5b1a16f33e8cbd78 traceID=f9b970ce5f01f392bee8dca451ba00c4
time="2025-01-16T15:37:00Z" level=debug msg="finished setting up network namespace rrinwqxoq8xx5xyw2ehllg2dl" span="[2/4] RUN ipconfig /all" spanID=5b1a16f33e8cbd78 traceID=f9b970ce5f01f392bee8dca451ba00c4
time="2025-01-16T15:37:08Z" level=debug msg="Calling proc (1)"
time="2025-01-16T15:37:08Z" level=debug msg="Calling proc (2)"
time="2025-01-16T15:37:08Z" level=debug msg="returning network namespace rrinwqxoq8xx5xyw2ehllg2dl from pool" span="[3/4] RUN nslookup google.com" spanID=c7b93686715648d1 traceID=f9b970ce5f01f392bee8dca451ba00c4
time="2025-01-16T15:37:09Z" level=error msg="/moby.buildkit.v1.Control/Solve returned error: rpc error: code = Unknown desc = process \"C:\\\\Windows\\\\System32\\\\WindowsPowerShell\\\\v1.0\\\\powershell -NonInteractive -NoProfile -Command $ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue'; nslookup google.com\" did not complete successfully: failed to create shim task: hcs::CreateComputeSystem sl4e3pv92h6my5lu0k08bdg34: The requested operation for attach namespace failed.: unknown" spanID=abb02b324b4c34d9 traceID=f9b970ce5f01f392bee8dca451ba00c4
process "C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell -NonInteractive -NoProfile -Command $ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue'; nslookup google.com" did not complete successfully: failed to create shim task: hcs::CreateComputeSystem sl4e3pv92h6my5lu0k08bdg34: The requested operation for attach namespace failed.: unknown
4052 v0.18.2 C:\Program Files\buildkit\buildkitd.exe --run-service --service-name buildkitd --containerd-worker=true --containerd-cni-config-path=C:\Program Files\containerd\cni\conf\0-containerd-nat.conf --containerd-cni-binary-dir=C:\Program Files\containerd\cni\bin --debug --log-file=C:\Windows\Temp\buildkitd.log
main.unaryInterceptor
        /src/cmd/buildkitd/main.go:717
google.golang.org/grpc.NewServer.chainUnaryServerInterceptors.chainUnaryInterceptors.func1
        /src/vendor/google.golang.org/grpc/server.go:1202
github.com/moby/buildkit/api/services/control._Control_Solve_Handler
        /src/api/services/control/control_grpc.pb.go:289
google.golang.org/grpc.(*Server).processUnaryRPC
        /src/vendor/google.golang.org/grpc/server.go:1394
google.golang.org/grpc.(*Server).handleStream
        /src/vendor/google.golang.org/grpc/server.go:1805
google.golang.org/grpc.(*Server).serveStreams.func2.1
        /src/vendor/google.golang.org/grpc/server.go:1029
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1700

4052 v0.18.2 C:\Program Files\buildkit\buildkitd.exe --run-service --service-name buildkitd --containerd-worker=true --containerd-cni-config-path=C:\Program Files\containerd\cni\conf\0-containerd-nat.conf --containerd-cni-binary-dir=C:\Program Files\containerd\cni\bin --debug --log-file=C:\Windows\Temp\buildkitd.log
github.com/moby/buildkit/solver/llbsolver/ops.(*ExecOp).Exec
        /src/solver/llbsolver/ops/exec.go:493
github.com/moby/buildkit/solver.(*sharedOp).Exec.func2
        /src/solver/jobs.go:1100
github.com/moby/buildkit/util/flightcontrol.(*call[...]).run
        /src/util/flightcontrol/flightcontrol.go:122
sync.(*Once).doSlow
        /usr/local/go/src/sync/once.go:76
sync.(*Once).Do
        /usr/local/go/src/sync/once.go:67
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1700

4052 v0.18.2 C:\Program Files\buildkit\buildkitd.exe --run-service --service-name buildkitd --containerd-worker=true --containerd-cni-config-path=C:\Program Files\containerd\cni\conf\0-containerd-nat.conf --containerd-cni-binary-dir=C:\Program Files\containerd\cni\bin --debug --log-file=C:\Windows\Temp\buildkitd.log
github.com/moby/buildkit/solver.(*edge).execOp
        /src/solver/edge.go:966
github.com/moby/buildkit/solver/internal/pipe.NewWithFunction[...].func2
        /src/solver/internal/pipe/pipe.go:78
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1700

time="2025-01-16T15:37:09Z" level=debug msg="session finished: <nil>" spanID=64affebe8305aa53 traceID=f9b970ce5f01f392bee8dca451ba00c4

Windows Hyper-V-Compute event log:

[sl4e3pv92h6my5lu0k08bdg34] Create Container, type 'Silo Container', settings '{"Owner":"containerd-shim-runhcs-v1.exe","SchemaVersion":{"Major":2,"Minor":1},"Container":{"GuestOs":{"HostName":"buildkitsandbox"},"Storage":{"Layers":[{"Id":"1a3348c8-c43b-5c1a-83f4-4bbff0d32f39","Path":"C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\98"},{"Id":"9caf22d5-d18a-59e5-afd8-a5e3870a1fee","Path":"C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\90"},{"Id":"298e3ca2-da55-5141-8d6a-d1924c6353ed","Path":"C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\89"},{"Id":"fb9d2f2a-b87c-5a9c-99c0-9e720042277a","Path":"C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\88"},{"Id":"2137054d-9b8a-5e07-8472-e1f06abc13fa","Path":"C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\87"},{"Id":"1d381a65-06c4-5a2a-8ba6-842762e847b5","Path":"C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\86"}],"Path":"\\\\?\\Volume{ca559c84-13df-4d85-897c-57fec4a804e0}\\"},"MappedDirectories":[{"HostPath":"\\\\?\\Volume{12586166-25a9-4653-bc2f-258d609e6840}\\Program Files\\buildkit\\buildkitd.exe","ContainerPath":"C:\\Windows\\System32\\get-user-info.exe","ReadOnly":true}],"MappedPipes":[{"ContainerPipeName":"otel-grpc","HostPath":"\\\\.\\pipe\\buildkit-otel-grpc"}],"Processor":{},"Networking":{"Namespace":"8d8fd7ab-75aa-4945-8cf5-8dfb906a9cae"},"RegistryChanges":{"AddValues":[{"Key":{"Hive":"System","Name":"ControlSet001\\Control"},"Name":"WaitToKillServiceTimeout","Type":"String","StringValue":"2147483647"}]}},"ShouldTerminateOnLastHandleClosed":true}'

[sl4e3pv92h6my5lu0k08bdg34] Queue system notification: 2 / 0x803B002E

[sl4e3pv92h6my5lu0k08bdg34] Create compute system, result 0x803B002E

HCN_E_NAMESPACE_ATTACH_FAILED 	The requested operation for attach namespace failed 	0x803b002E

HNS Output:

PS > Get-HnsEndpoint

ActivityId                : 2974E63E-C9DD-4A5B-A867-65069FBFA448
AdditionalParams          :
CreateProcessingStartTime : 133815154201296258
DNSServerList             : 168.63.129.16
DNSSuffix                 : yzspoegj5jtuzenk3d2ra04d2g.bx.internal.cloudapp.net
EnableLowInterfaceMetric  : True
EncapOverhead             : 0
GatewayAddress            : 172.24.96.1
Health                    : @{LastErrorCode=0; LastUpdateTime=133815154201266283}
ID                        : CC4C7F16-26BC-4CDF-B052-E840FE09DA93
IPAddress                 : 172.24.105.2
InterfaceConstraint       : @{InterfaceGuid=00000000-0000-0000-0000-000000000000}
MacAddress                : 00-15-5D-41-7B-2F
Name                      : rrinwqxoq8xx5xyw2ehllg2dl_nat
Namespace                 : @{ID=8D8FD7AB-75AA-4945-8CF5-8DFB906A9CAE; IsDefault=False}
Policies                  : {}
PrefixLength              : 20
RemoveProcessingStartTime : 133815154255318204
Resources                 : @{AdditionalParams=; AllocationOrder=2; Allocators=System.Object[]; Health=; ID=2974E63E-C9DD-4A5B-A867-65069FBFA448; PortOperationTime=0; State=1; SwitchOperationTime=0; VfpOperationTime=0; parentId=871E0337-F1D6-45D2-9BD3-71BEB6A40E21}
SharedContainers          : {}
StartTime                 : 133815154288301459
State                     : 4
Type                      : NAT
Version                   : 38654705669
VirtualNetwork            : 10de7571-39fa-4fa5-9c94-b06cee9dc9c1
VirtualNetworkName        : nat



PS > Get-HnsNamespace -Id 8D8FD7AB-75AA-4945-8CF5-8DFB906A9CAE

ActivityId       : 381977D2-D854-45C7-B386-961006DD4892
AdditionalParams :
CompartmentGuid  : 00000000-0000-0000-0000-000000000000
CompartmentId    : 2
Containers       : {}
Health           : @{LastErrorCode=0; LastUpdateTime=133815154200336902}
ID               : 8D8FD7AB-75AA-4945-8CF5-8DFB906A9CAE
IsDefault        : False
Policies         : {}
ResourceList     : {@{Data=; Type=Endpoint}}
Resources        : @{AdditionalParams=; AllocationOrder=0; Health=; ID=381977D2-D854-45C7-B386-961006DD4892; PortOperationTime=0; State=1; SwitchOperationTime=0; VfpOperationTime=0}
State            : 1
Type             : VM
Version          : 38654705669

If I remove the HnsEndpoint (Get-HnsEndpoint | Where-Object {$_.State -eq 4} | Remove-HnsEndpoint), the container will run one step and fail on the next RUN trying to re-use the endpoint. The container won't be able to connect out to the Internet though, ping and DNS lookups fail.

#7 [4/4] RUN ping google.com
#7 4.874 Ping request could not find host google.com. Please check the name and try again.
#7 ERROR: process "C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell -NonInteractive -NoProfile -Command $ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue'; ping google.com" did not complete successfully: exit code: 1
------
 > [4/4] RUN ping google.com:
4.874 Ping request could not find host google.com. Please check the name and try again.

Since the CNI pool size cannot be less than one (setting 0 provisions a single namespace) and the cache time cannot be adjusted (const 5 minutes), I can find no workaround to prevent the namespace reuse on 2019. If the namespace was cleaned up each time, this would work.

I'm unsure the root cause of the bug, if it is a 2019 HCS issue or not.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions