Skip to content

Conversation

@drchristensen
Copy link

Checklist

Ensure you have completed the following checklist for your pull request to be reviewed:

  • Certify you wrote the patch or otherwise have the right to pass it on as an open-source patch by signing all
    commits. (git commit -s). (If needed, use git commit -s --amend). The author email must match
    the sign-off email address. See CONTRIBUTING.md
    for more information.
  • Referenced issues using Fixes: #00000 in commit message (if applicable)
  • Tests have been added/updated (or no tests are needed)
  • Documentation has been updated (or no documentation changes are needed)
  • All commits pass make validatepr (format/lint checks)
  • Release note entered in the section below (or None if no user-facing changes)

Does this PR introduce a user-facing change?

None

When a container using pasta networking is restarted, the pasta process may not exit immediately when its network namespace is deleted. This causes the new pasta instance to fail binding to the same ports with "address already in use" errors.

This PR adds explicit termination of the pasta process during network teardown:

  1. Search /proc for pasta processes matching the container's netns path
  2. Send SIGTERM and wait up to 1 second for graceful exit
  3. Fall back to SIGKILL if the process doesn't respond

The implementation includes safeguards against false positives (only matches executables named exactly pasta or paths ending in /pasta) and handles race conditions gracefully.

Fixes: #23737

David Christensen added 3 commits January 13, 2026 19:49
- matchPastaCmdline(): Check if cmdline args match a pasta process with a
  specific --netns argument
- findPastaProcess(): Scan /proc to find pasta process by netns path

These helpers will be used by the pasta teardown logic to find processes
that need to be terminated during network cleanup.

Relates-to: containers#23737

Signed-off-by: David Christensen <[email protected]>
Assistance provided by AI
- Send SIGTERM first with a 1-second timeout
- Fall back to SIGKILL if process doesn't exit gracefully
- Wait for SIGKILL to take effect before returning
- Add comprehensive test coverage for teardown logic and cmdline matching

This ensures the network port is freed before container restart proceeds.

Relates-to: containers#23737

Signed-off-by: David Christensen <[email protected]>
Assistance provided by AI
- Call teardownPasta() from teardownNetwork() for pasta-mode containers
- Clear pastaResult in cleanupNetwork() to ensure clean state for restart
- Add system test for container restart with pasta and published ports

Fixes: containers#23737

Signed-off-by: David Christensen <[email protected]>
Assistance provided by AI
@drchristensen
Copy link
Author

Failing workflows appear unrelated to networking code/tests implemented in the commits:

Int podman fedora-43 root host

Failing in shared layer testing:

[+0829s]   /var/tmp/go/src/github.com/containers/podman/test/e2e/common_test.go:1528

Test comments suggest the test is not reliable:

        It("podman image rm - concurrent with shared layers", func() {
                // #6510 has shown a fairly simple reproducer to force storage
                // errors during parallel image removal.  Since it's subject to
                // a race, we may not hit the condition a 100 percent of times
                // but ocal reproducers hit it all the time.

int podman rawhide root host

Failing in log coloring:

[+0883s] Summarizing 1 Failure:
[+0883s]   [FAIL] Podman logs [It] podman pod logs with different colors
[+0883s]   /var/tmp/go/src/github.com/containers/podman/test/e2e/logs_test.go:693

testing-farm:fedora-rawhide-x86_64:podman-fedora

Test duration is capped at 30m, test failed at 30m mark:

ok 780 podman --ssh test # skip only applicable on podman-remote
ok 781 [950] podman preexec hook

Maximum test time '30m' exceeded.
Adjust the test 'duration' attribute if necessary.
https://tmt.readthedocs.io/en/stable/spec/tests.html#duration

@baude
Copy link
Member

baude commented Jan 14, 2026

Thank you for your pull request submission. I have restarted the failed tests. Given the topic of this PR, I really want @mheon and @Luap99 to review it prior to merging and they both happen to be on PTO this week. I'll walk through it as well but wont give a very deep dive.

@baude
Copy link
Member

baude commented Jan 14, 2026

@sbrivio-rh mind also reviewing?

@sbrivio-rh sbrivio-rh added pasta pasta(1) bugs or features network Networking related issue or feature labels Jan 14, 2026
@sbrivio-rh
Copy link
Collaborator

Sorry, I wasn't aware of the issue at all, I would have looked into it earlier (I'm the original author of Podman's pasta integration).

When a container using pasta networking is restarted, the pasta process may not exit immediately when its network namespace is deleted.

@drchristensen do you happen to know why? Isn't that something we should fix instead?

In general, pasta's interfaces were implemented with typical integrations (Podman, moby/rootlesskit) in mind, trying to keep them as simple as possible. This adds substantial complexity for a behaviour that, I guess, wasn't intended.

Another remark from a first quick read:

Search /proc for pasta processes matching the container's netns path

You don't really need to, pasta can save its PID file somewhere if you want. But again, Podman's integration doesn't ask for that because it wasn't needed. If it became needed all of a sudden, it can be added. The option is --pid / -P (I still hope we don't actually need it).

@sbrivio-rh sbrivio-rh self-requested a review January 14, 2026 14:56
@drchristensen
Copy link
Author

Sorry, I wasn't aware of the issue at all, I would have looked into it earlier (I'm the original author of Podman's pasta integration).

When a container using pasta networking is restarted, the pasta process may not exit immediately when its network namespace is deleted.

@drchristensen do you happen to know why? Isn't that something we should fix instead?

Pasta implementation in vendor/go.podman.io/common/libnetwork/pasta/pasta_linux.go:55-56 states:
"Note that there is no need for any special cleanup logic, the pasta process will automatically exit when the netns path is deleted."

But pasta doesn't exit synchronously when the network namespace is deleted. Instead, it detects netns deletion using:

  1. inotify — watching for the netns file to be unmounted
  2. Polling fallback — checking every 1 second if the netns still exists

This may lead to a race condition when podman restarts a container:

  1. Deletes the old netns
  2. Creates a new netns
  3. Starts a new pasta process

The old pasta process may still be running during step 3 because it hasn't yet detected the netns deletion. Since it's still holding the port binding, the new pasta fails with "address already in use."

The podman change is a reasonable defensive code implementation. Even if pasta improves, having explicit cleanup as a fallback is sensible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

network Networking related issue or feature pasta pasta(1) bugs or features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

container restart fails when using pasta + port publish

3 participants