Skip to content

Conversation

@empovit
Copy link
Contributor

@empovit empovit commented Sep 15, 2025

Red Hat OpenShift blocks writing into /etc, causing the following error in compute-domain-daemon pods:

IMEXDaemonUpdateLoop failed, initiate shutdown:
writeNodesConfig failed: failed to create nodes config file:
open /etc/nvidia-imex/nodes_config.cfg: permission denied

Binding anyuid SCC to the service account when run on OpenShift solves this problem.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 15, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@empovit empovit force-pushed the add-anyuid-scc-openshift branch from d02ac26 to 23284f4 Compare September 15, 2025 09:21
@jgehrcke
Copy link
Collaborator

jgehrcke commented Sep 15, 2025

Red Hat OpenShift blocks writing into /etc

Great input! Thanks.

To address this, we may want to decide to write the IMEX daemon config file to a path not in /etc/. Because we are in full control, and we don't need to work below /etc. I'll consult with @klueska. Tracking the idea here: #571

IIUC, the problem you report will be even more relevant for us trying to control of /etc/hosts (see for example #537 for context). We learn more and more about how the environment claims control over contents in /etc, and we are not as free as one might naively assume.

@empovit empovit marked this pull request as draft September 15, 2025 16:42
@empovit empovit force-pushed the add-anyuid-scc-openshift branch from 23284f4 to 4fe1628 Compare September 15, 2025 19:20
@empovit empovit marked this pull request as ready for review September 15, 2025 19:34
Red Hat OpenShift blocks writing into `/etc`, causing the
following error in compute-domain-daemon pods:

```
IMEXDaemonUpdateLoop failed, initiate shutdown:
writeNodesConfig failed: failed to create nodes config file:
open /etc/nvidia-imex/nodes_config.cfg: permission denied
```

Binding `anyuid` SCC to the service account when run on OpenShift
solves this problem.

Signed-off-by: Vitaliy Emporopulo <[email protected]>
@empovit empovit force-pushed the add-anyuid-scc-openshift branch from 4fe1628 to 2542d7b Compare September 15, 2025 20:33
@empovit empovit marked this pull request as draft September 15, 2025 20:44
@empovit empovit marked this pull request as ready for review September 15, 2025 20:51
@klueska klueska added this to the v25.8.0 milestone Sep 16, 2025
@klueska klueska added security backport-25.3 robustness issue/pr: edge cases & fault tolerance labels Sep 18, 2025
@klueska klueska modified the milestones: v25.8.0, v25.12.0, v25.8.1 Sep 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-25.3 robustness issue/pr: edge cases & fault tolerance security

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

3 participants