Skip to content

ns: refresh process namespace on stale nsenter (e.g. iscsid PID change)#176

Open
ksoviero wants to merge 1 commit into
longhorn:mainfrom
ksoviero:main
Open

ns: refresh process namespace on stale nsenter (e.g. iscsid PID change)#176
ksoviero wants to merge 1 commit into
longhorn:mainfrom
ksoviero:main

Conversation

@ksoviero

Copy link
Copy Markdown

Made-with: Cursor

Which issue(s) this PR fixes:

Issue longhorn/longhorn#10544

What this PR does / why we need it:

nsenter in the namespace executor is configured using the mount/network namespace paths under the host’s /proc, which are derived from a single chosen PID (e.g. iscsid for iSCSI operations). That path is resolved once when the executor is created and then reused for every command. If the daemon restarts, its PID (and therefore /host/proc//ns/...) changes or disappears. The next nsenter then fails with errors like cannot open .../ns/mnt: No such file or directory—which shows up in Longhorn as iSCSI initiator refresh / frontend expand failures after iscsi(d) or similar restarts.

This PR keeps the same behavior for successful runs, but on a narrow, recognizable failure from nsenter (stale namespace path under /ns/…), it re-resolves the process namespace directory the same way as at creation, updates the cached nsDirectory, and retries the command once. If re-resolution fails, the original nsenter error is returned so operators still see the primary failure mode.

Why it’s needed

It removes the need to restart the instance manager (or other workarounds) when the only problem is an outdated cached PID for a long-lived Executor, and makes volume operations that depend on iscsiadm in the daemon’s namespaces more robust across iscsi daemon restarts on the node.

Special notes for your reviewer:

I tested this is in my homelab and it resolved the issue with expanding volumes failing after the iSCSI daemon had been restarted.

@ksoviero

Copy link
Copy Markdown
Author

@derekbit @c3y1huang is there anything I can do to help get this across the finish line?

@tvanderka

tvanderka commented May 12, 2026

Copy link
Copy Markdown

Would still fail if the old pid got reused by a new process. Rare corner case, but it would likely run iscsiadm from that new pids ns as a privileged process.
edit: or just run a random container with process named "iscsid" to hijack the executor?

Signed-off-by: Kevin Soviero <ksoviero@gmail.com>
@ksoviero

Copy link
Copy Markdown
Author

Would still fail if the old pid got reused by a new process

That's a problem as is, so this PR doesn't introduce any regressions in that regard.

Would still fail if the old pid got reused by a new process. Rare corner case, but it would likely run iscsiadm from that new pids ns as a privileged process.
edit: or just run a random container with process named "iscsid" to hijack the executor?

I don't see a good workaround without completely re-architecting how iscsid is implemented in Longhorn. For now, this is the least bad option that at least solves the current problem where applying security updates breaks your ability to scale volumes in Longhorn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants