-
Notifications
You must be signed in to change notification settings - Fork 253
oss(credentials): support STS.Token in nodePublishSecretRef and rotation by republish #1591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
oss(credentials): support STS.Token in nodePublishSecretRef and rotation by republish #1591
Conversation
77d2b5d to
da50d2e
Compare
da50d2e to
5817fea
Compare
pkg/mounter/utils/filelock.go
Outdated
| } | ||
|
|
||
| // WriteFileWithLock safely writes data to file with locking | ||
| func WriteFileWithLock(path string, data []byte, perm os.FileMode) (done bool, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this lock secure the secret file update? If we write a new token to a separate file and then rename it, we wouldn't need locks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ptal again.
9465193 to
c3b1c21
Compare
c3b1c21 to
4f91d33
Compare
6021061 to
51b3bcf
Compare
| dir := filepath.Dir(path) | ||
| tmpFile := filepath.Join(dir, fmt.Sprintf(".%s.tmp.%d", filepath.Base(path), time.Now().UnixNano())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| dir := filepath.Dir(path) | |
| tmpFile := filepath.Join(dir, fmt.Sprintf(".%s.tmp.%d", filepath.Base(path), time.Now().UnixNano())) | |
| tmpFile := path + ".next" |
I think we can just use a fixed name? So that we will not worry about leaking the tmp files. And no need to clean it up when rename failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to use a defer deletion function to deal with this issue instead of using fixed names, it will bring many overwriting corner cases.
| // ossfs2 | ||
| // For ossfs2, file-path is a common option configuration after -o, so append to op.Options | ||
| op.Options = append(op.Options, | ||
| fmt.Sprintf("oss_sts_multi_conf_ak_file=%s", filepath.Join(tokenDir, mounterutils.KeyAccessKeyId)), | ||
| fmt.Sprintf("oss_sts_multi_conf_sk_file=%s", filepath.Join(tokenDir, mounterutils.KeyAccessKeySecret)), | ||
| fmt.Sprintf("oss_sts_multi_conf_token_file=%s", filepath.Join(tokenDir, mounterutils.KeySecurityToken)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ossfs2 takes these 3 path seperately, it is garanteed to face race condition: ossfs opens one file, CSI replaces them, and ossfs opens other files, then ossfs will get inconsistent credential. Hopes ossfs2 will immediately retry in this case, and not cause IOError for user.
For ossfs1 to work correctly, it should first open a fd to the dir, them use openat API to open individual files relative to the fd. Can you check how ossfs1 implements this?
BTW, what's the point of this comment?
// For ossfs2, file-path is a common option configuration after -o, so append to op.Options
It is also an option after -o for OSSFS1, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ossfs1 uses openat.
And I modified the mentioned comment. In fact it means the comparison between the different credential methods of ossfs2.
| // This function uses a symlink-based approach similar to Kubernetes configmap volume plugin: | ||
| // 1. Create a temporary data directory (e.g., ..data_tmp_<timestamp>) | ||
| // 2. Write all token files to the temporary directory | ||
| // 3. Atomically switch the ..data symlink to point to the new directory | ||
| // 4. Create symlinks for each token file pointing to ..data/<filename> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be simplified: If path is /run/ossfs/volumeID/sts, we can make:
/run/ossfs/
└── volumeID
├── sts -> sts.20260101000000
└── sts.20260101000000
├── ak
├── expiration
├── sk
└── token
i.e., we don't need a symlink for every file. kubelet did that because workload expect the files exist at the root of the volume, while we can make an extra level of dir in the volume.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, with a subdirectory passthrough to clients instead.
| // Clean up old data directory if it exists | ||
| // Only remove the token files we know about, not the entire directory | ||
| if currentDataDir != "" && currentDataDir != tmpDataDir { | ||
| // Remove old data directory asynchronously to avoid blocking |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not nessesary IMO. cleanup should be blocking, as we don't want to leak credentials on disk.
And I think we should keep currentDataDir for a while? Because ossfs may still holding a fd to the old dir and trying to read from it. Maybe holding two most recent dirs, and remove the first one before creating the third one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with the last-updated accesskeyid to solve this issue, instead of holding the former dir.
Simplify rotateTokenFiles to use single directory-level symlink instead of two-layer approach. Use tokenDir/sts path to avoid first-call issues. Add cleanup helpers and defer guards for resource cleanup.
Reorder token keys so AccessKeyId is rotated last, ensuring ossfs/ossfs2 clients see either all old or all new files, never a mixed state.
| currentDataDir := "" | ||
| if linkTarget, readErr := os.Readlink(dataLinkPath); readErr == nil { | ||
| currentDataDir = filepath.Join(dir, linkTarget) | ||
| if linkTarget, readErr := os.Readlink(dir); readErr == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is readErr, we must know what error it is, even if it doesn't affect process. please add log for record
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this whole function, it assumes that the dir is not exists (the first call for mounting), or a symlink (the sequential calls for rotating). So here I mute the error logs, and do not consider the regular directory case for another comment u left below.
The latest commit I return error for these corner cases, and solve the rename issue also. (But this will not happen I guess.)
937e610 to
12e150e
Compare
Return error when rotateTokenFiles encounters a regular directory instead of symlink, as rename would break open file handles. Also reject rotation on Readlink failures (e.g., permission denied).
12e150e to
6049442
Compare
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: AlbeeSo, mowangdk The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
What this PR does / why we need it:
requiredRepublishto true.kubectl create -n default secret generic oss-secret --from-literal='accessKeyId=<yourAccessKeyFromAssumeRole>' --from-literal='accessKeySecret=<yourAccessKeySecretFromAssumeRole>' --from-literal='securityToken=<yourSecurityTokenFromAssumeRole>' --from-literal='Expiration=2025-12-22T04:11:50Z'Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: