hostagent: stop destroying ga.sock on guest-agent reconnect#4911
Open
mn-ram wants to merge 1 commit intolima-vm:masterfrom
Open
hostagent: stop destroying ga.sock on guest-agent reconnect#4911mn-ram wants to merge 1 commit intolima-vm:masterfrom
mn-ram wants to merge 1 commit intolima-vm:masterfrom
Conversation
watchGuestAgentEvents and the inotify-startup goroutine both call
forwardSSH(verbForward) for the guest-agent unix socket on every
reconnect tick, with no synchronization between them and no -O cancel
of the prior forward. forwardSSH unlinks the local socket file before
asking the SSH ControlMaster to bind a new listener, so:
* The two goroutines race on os.RemoveAll/bind of the same path,
and consumers dialing during the window observe ENOENT.
* The ControlMaster still has the previous forward registered, so
`ssh -O forward -L localUnix:remoteUnix` exits non-zero with
"forwarding for listen path X already exists". forwardSSH's
failure branch then unlinks the socket a second time, leaving
ga.sock permanently missing on disk while the mux still believes
a forward is alive. getOrCreateClient cannot reconnect, dynamic
port forwarding stops being announced, and inotify mount
invalidation goes silent until the user runs `limactl stop`
followed by `limactl start`.
Fix: introduce reForwardGuestAgentSock, which serializes via a new
gaSockForwardMu and issues a best-effort -O cancel before -O forward.
Both call sites in watchGuestAgentEvents now go through the helper.
Reproduces deterministically on master with:
limactl start default
limactl shell default -- sudo systemctl restart lima-guestagent.service
sleep 12
ls ~/.lima/default/ga.sock # ENOENT before the fix; present after.
Closes: lima-vm#2227
Signed-off-by: mn-ram <235066282+mn-ram@users.noreply.github.com>
414fab2 to
679d5e1
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a long-standing bug where Lima's host↔guest gRPC connection silently and permanently breaks after any guest-agent restart or in-VM reboot, requiring
limactl stop && limactl startto recover.watchGuestAgentEventsand the inotify-startup goroutine inpkg/hostagent/hostagent.goboth callforwardSSH(verbForward, …)for the guest-agent unix socket on every reconnect tick, with no synchronization between them and no-O cancelof the prior forward.forwardSSHos.RemoveAlls the local socket file as its first step and only then asks the SSHControlMasterto bind a new listener, so two things go wrong:os.RemoveAll/bind of the same path; any consumer dialing during the window observesENOENT.ControlMasterstill has the previous forward registered, sossh -O forward -L ga.sock:/run/lima-guestagent.sockexits non-zero with "forwarding for listen path X already exists".forwardSSH's failure branch then unlinks the socket a second time, leavingga.sockpermanently missing on disk while the mux still believes a forward is alive.getOrCreateClientcannot reconnect, dynamic port-forward announcement stops, and inotify mount invalidation goes silent.The fix introduces a tiny helper
reForwardGuestAgentSockthat:gaSockForwardMuso the reconnect loop and the inotify goroutine cannot race.verbCancelbeforeverbForward, so theControlMasterreleases the prior registration and the new bind succeeds cleanly.Both existing call sites are switched to the helper. No API changes, no architecture changes, ~40 lines.
This sits naturally next to the recent fixes in this area (#4889, #4895) and closes the oldest open user report of the symptom.
Closes #2227
Test plan
go vet ./pkg/hostagent/... ./cmd/...— cleango test ./pkg/hostagent/...— passga.socksurvives every iteration.mountInotify: true: edit a file on the host after the guest-agent restart and confirm the guest sees the change.