-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Fix Restart Issue with Pasta Networking Driver #27901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
8ab846a to
688ac4d
Compare
- matchPastaCmdline(): Check if cmdline args match a pasta process with a specific --netns argument - findPastaProcess(): Scan /proc to find pasta process by netns path These helpers will be used by the pasta teardown logic to find processes that need to be terminated during network cleanup. Relates-to: containers#23737 Signed-off-by: David Christensen <[email protected]> Assistance provided by AI
- Send SIGTERM first with a 1-second timeout - Fall back to SIGKILL if process doesn't exit gracefully - Wait for SIGKILL to take effect before returning - Add comprehensive test coverage for teardown logic and cmdline matching This ensures the network port is freed before container restart proceeds. Relates-to: containers#23737 Signed-off-by: David Christensen <[email protected]> Assistance provided by AI
- Call teardownPasta() from teardownNetwork() for pasta-mode containers - Clear pastaResult in cleanupNetwork() to ensure clean state for restart - Add system test for container restart with pasta and published ports Fixes: containers#23737 Signed-off-by: David Christensen <[email protected]> Assistance provided by AI
688ac4d to
faa5ba9
Compare
|
Failing workflows appear unrelated to networking code/tests implemented in the commits: Int podman fedora-43 root hostFailing in shared layer testing: Test comments suggest the test is not reliable: int podman rawhide root hostFailing in log coloring: testing-farm:fedora-rawhide-x86_64:podman-fedoraTest duration is capped at 30m, test failed at 30m mark: |
|
@sbrivio-rh mind also reviewing? |
|
Sorry, I wasn't aware of the issue at all, I would have looked into it earlier (I'm the original author of Podman's pasta integration).
@drchristensen do you happen to know why? Isn't that something we should fix instead? In general, pasta's interfaces were implemented with typical integrations (Podman, moby/rootlesskit) in mind, trying to keep them as simple as possible. This adds substantial complexity for a behaviour that, I guess, wasn't intended. Another remark from a first quick read:
You don't really need to, pasta can save its PID file somewhere if you want. But again, Podman's integration doesn't ask for that because it wasn't needed. If it became needed all of a sudden, it can be added. The option is |
Pasta implementation in vendor/go.podman.io/common/libnetwork/pasta/pasta_linux.go:55-56 states: But pasta doesn't exit synchronously when the network namespace is deleted. Instead, it detects netns deletion using:
This may lead to a race condition when podman restarts a container:
The old pasta process may still be running during step 3 because it hasn't yet detected the netns deletion. Since it's still holding the port binding, the new pasta fails with "address already in use." The podman change is a reasonable defensive code implementation. Even if pasta improves, having explicit cleanup as a fallback is sensible. |
Checklist
Ensure you have completed the following checklist for your pull request to be reviewed:
commits. (
git commit -s). (If needed, usegit commit -s --amend). The author email must matchthe sign-off email address. See CONTRIBUTING.md
for more information.
Fixes: #00000in commit message (if applicable)make validatepr(format/lint checks)Noneif no user-facing changes)Does this PR introduce a user-facing change?
When a container using pasta networking is restarted, the pasta process may not exit immediately when its network namespace is deleted. This causes the new pasta instance to fail binding to the same ports with "address already in use" errors.
This PR adds explicit termination of the pasta process during network teardown:
The implementation includes safeguards against false positives (only matches executables named exactly pasta or paths ending in /pasta) and handles race conditions gracefully.
Fixes: #23737