Skip to content

Conversation

@hhk7734
Copy link

@hhk7734 hhk7734 commented Jul 7, 2025

What this PR does / why we need it:

This PR ensures that the SSH private key secret is mounted with the correct permissions (0600).
When using .spec.mlPolicy.mpi.sshAuthMountPath=/root/.ssh, the following error occurs if the key is mounted with default permissions:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@         WARNING: UNPROTECTED PRIVATE KEY FILE!          @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Permissions 0644 for '/root/.ssh/id_rsa' are too open.
It is required that your private key files are NOT accessible by others.
This private key will be ignored.
Unable to load host key "/root/.ssh/id_rsa": bad permissions
Unable to load host key: /root/.ssh/id_rsa
sshd: no hostkeys available -- exiting.

To prevent this issue, the secret is now mounted with 0600 permissions.

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):
Fixes #

Checklist:

  • Docs included if any changes are user facing

@astefanutti
Copy link
Contributor

/test

@google-oss-prow
Copy link

@astefanutti: No presubmit jobs available for kubeflow/trainer@master

In response to this:

/test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@astefanutti
Copy link
Contributor

/lgtm

@astefanutti
Copy link
Contributor

/ok-to-test

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/assign @tenzen-y

@coveralls
Copy link

coveralls commented Jul 7, 2025

Pull Request Test Coverage Report for Build 16119492403

Details

  • 19 of 19 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.1%) to 29.99%

Totals Coverage Status
Change from base Build 16085286158: 0.1%
Covered Lines: 936
Relevant Lines: 3121

💛 - Coveralls

@google-oss-prow
Copy link

New changes are detected. LGTM label has been removed.

@google-oss-prow google-oss-prow bot removed the lgtm label Jul 7, 2025
@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from tenzen-y. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot added size/M and removed size/S labels Jul 7, 2025
@hhk7734 hhk7734 force-pushed the fix-mpi-key-mode branch from c68147c to 8a349ff Compare July 7, 2025 14:16
@hhk7734
Copy link
Author

hhk7734 commented Jul 11, 2025

@astefanutti
I had only tested it as the root user, but I realized there was a permission issue for non-root users, so I fixed it.
I referred to https://github.com/kubeflow/mpi-operator/blob/82c9ad303d871849f1a0e093a3654cd4a6841d46/pkg/controller/mpi_job_controller.go#L1702-L1704

@hhk7734
Copy link
Author

hhk7734 commented Jul 21, 2025

ping @tenzen-y

@hhk7734 hhk7734 requested a review from andreyvelich July 21, 2025 02:55
WithPath(constants.MPISSHAuthorizedKeys),
)

if *info.RuntimePolicy.MLPolicySource.MPI.SSHAuthMountPath == "/root/.ssh" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might not be a typical use case, but would that'd be safer to check if the DefaultMode hasn't already been set, e.g. in the training runtime, and only set it if it's nil?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because apply.UpsertVolumes replaces the volume if the name matches, or adds it if it's not present, checking DefaultMode in the training runtime doesn't seem meaningful.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@andreyvelich
Copy link
Member

/remove-lifecycle stale
@Electronic-Waste @hhk7734 @tenzen-y @astefanutti Do we want to target it before v2.1 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants