Skip to content

fix: Add securityContext to RHOAI runtimes for volume mount permissio…#24

Closed
abhijeet-dhumal wants to merge 1 commit intoopendatahub-io:mainfrom
abhijeet-dhumal:add-security-context
Closed

fix: Add securityContext to RHOAI runtimes for volume mount permissio…#24
abhijeet-dhumal wants to merge 1 commit intoopendatahub-io:mainfrom
abhijeet-dhumal:add-security-context

Conversation

@abhijeet-dhumal
Copy link
Copy Markdown
Member

@abhijeet-dhumal abhijeet-dhumal commented Nov 19, 2025

…ns compatibility

What this PR does / why we need it:
Set runAsUser=1000 and fsGroup=1000 in torch runtime manifests to fix volume mount permission errors when mounting volumes via podTemplateOverrides.

Related issues tracked upstream : kubeflow/issues/2992

Training Example : https://github.com/abhijeet-dhumal/sdk/blob/add-progression-example/examples/transformers-text-classification.ipynb

job_name = trainer_client.train(
   ....
    runtime=trainer_client.get_runtime("torch-cuda-251-with-pvc"),
    options=[
        PodTemplateOverrides(
            PodTemplateOverride(
                target_jobs=["node"],
                spec=PodSpecOverride(
                    volumes=[{"name": "workspace", "persistentVolumeClaim": {"claimName": "rwx-pvc-name"}}],
                    containers=[ContainerOverride(name="node", volume_mounts=[{"name": "workspace", "mountPath": "/workspace"}])]
                )
            )
        )
    ]
)
Traceback (most recent call last):
  File "/opt/app-root/src/574733104.py", line 66, in <module>
    train_sentiment_classifier()
  File "/opt/app-root/src/574733104.py", line 22, in train_sentiment_classifier
    model.save_pretrained(model_path)
  File "/opt/app-root/lib64/python3.11/site-packages/transformers/modeling_utils.py", line 3306, in save_pretrained
    os.makedirs(save_directory, exist_ok=True)
  File "<frozen os>", line 225, in makedirs
PermissionError: [Errno 13] Permission denied: '/workspace/model'
E1119 07:04:47.684000 1 torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 0 (pid: 12) of binary: /opt/app-root/bin/python3.11

Checklist:

  • Docs included if any changes are user facing

…ns compatibility

Signed-off-by: abhijeet-dhumal <abhijeetdhumal652@gmail.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Nov 19, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

spec:
template:
spec:
securityContext:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try to avoid changing the security context.

My understanding is that it can be avoided by mounting an empty / tmp dir as workspace or changing the workspace directory

11:48
My understanding is that it can be avoided by mounting an empty / tmp dir as workspace or changing the workspace directory

Copy link
Copy Markdown
Member Author

@abhijeet-dhumal abhijeet-dhumal Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, so user should always opt for the directories like a /tmp dir always writable by any user ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please correct me here but, the purpose of podTemplateOverrides with volume mounts was to allow users to mount PVCs at any configurable path they choose (like /workspace, /data, /models, etc.)
But without securityContext specified, User will always get permission denied error because:

  • when user mount a PVC at an arbitrary path, container will run as a user 1001 (or whatever the image defaults to)
  • PVC is usually owned by root (UID 0) or some other UID

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants