Skip to content

Conversation

@AlbeeSo
Copy link
Member

@AlbeeSo AlbeeSo commented Dec 24, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

  1. recreate OSS csidriver and set requiredRepublish to true.
  2. create secret like
    kubectl create -n default secret generic oss-secret --from-literal='accessKeyId=<yourAccessKeyFromAssumeRole>' --from-literal='accessKeySecret=<yourAccessKeySecretFromAssumeRole>' --from-literal='securityToken=<yourSecurityTokenFromAssumeRole>' --from-literal='Expiration=2025-12-22T04:11:50Z'
  3. create a OSS PV with nodePublishSecretRef with the secret created.
  4. rotate the credentials in secret (remain more than 20 minutes expiration)
  5. check the OSS PV can still access server, and the audit logs of OSS shows the new credentials are used.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?


Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 24, 2025
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Dec 24, 2025
@AlbeeSo AlbeeSo force-pushed the oss/token-rotation-rewrite branch 2 times, most recently from 77d2b5d to da50d2e Compare December 24, 2025 06:07
@AlbeeSo AlbeeSo force-pushed the oss/token-rotation-rewrite branch from da50d2e to 5817fea Compare December 24, 2025 07:05
}

// WriteFileWithLock safely writes data to file with locking
func WriteFileWithLock(path string, data []byte, perm os.FileMode) (done bool, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this lock secure the secret file update? If we write a new token to a separate file and then rename it, we wouldn't need locks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ptal again.

@AlbeeSo AlbeeSo force-pushed the oss/token-rotation-rewrite branch 3 times, most recently from 9465193 to c3b1c21 Compare December 25, 2025 09:23
@AlbeeSo AlbeeSo force-pushed the oss/token-rotation-rewrite branch from c3b1c21 to 4f91d33 Compare December 25, 2025 10:10
@AlbeeSo AlbeeSo force-pushed the oss/token-rotation-rewrite branch from 6021061 to 51b3bcf Compare December 25, 2025 12:19
@AlbeeSo AlbeeSo requested a review from mowangdk December 31, 2025 02:15
Comment on lines 68 to 69
dir := filepath.Dir(path)
tmpFile := filepath.Join(dir, fmt.Sprintf(".%s.tmp.%d", filepath.Base(path), time.Now().UnixNano()))
Copy link
Contributor

@huww98 huww98 Jan 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dir := filepath.Dir(path)
tmpFile := filepath.Join(dir, fmt.Sprintf(".%s.tmp.%d", filepath.Base(path), time.Now().UnixNano()))
tmpFile := path + ".next"

I think we can just use a fixed name? So that we will not worry about leaking the tmp files. And no need to clean it up when rename failed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to use a defer deletion function to deal with this issue instead of using fixed names, it will bring many overwriting corner cases.

Comment on lines 148 to 153
// ossfs2
// For ossfs2, file-path is a common option configuration after -o, so append to op.Options
op.Options = append(op.Options,
fmt.Sprintf("oss_sts_multi_conf_ak_file=%s", filepath.Join(tokenDir, mounterutils.KeyAccessKeyId)),
fmt.Sprintf("oss_sts_multi_conf_sk_file=%s", filepath.Join(tokenDir, mounterutils.KeyAccessKeySecret)),
fmt.Sprintf("oss_sts_multi_conf_token_file=%s", filepath.Join(tokenDir, mounterutils.KeySecurityToken)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ossfs2 takes these 3 path seperately, it is garanteed to face race condition: ossfs opens one file, CSI replaces them, and ossfs opens other files, then ossfs will get inconsistent credential. Hopes ossfs2 will immediately retry in this case, and not cause IOError for user.

For ossfs1 to work correctly, it should first open a fd to the dir, them use openat API to open individual files relative to the fd. Can you check how ossfs1 implements this?

BTW, what's the point of this comment?

// For ossfs2, file-path is a common option configuration after -o, so append to op.Options

It is also an option after -o for OSSFS1, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ossfs1 uses openat.
And I modified the mentioned comment. In fact it means the comparison between the different credential methods of ossfs2.

Comment on lines 188 to 192
// This function uses a symlink-based approach similar to Kubernetes configmap volume plugin:
// 1. Create a temporary data directory (e.g., ..data_tmp_<timestamp>)
// 2. Write all token files to the temporary directory
// 3. Atomically switch the ..data symlink to point to the new directory
// 4. Create symlinks for each token file pointing to ..data/<filename>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be simplified: If path is /run/ossfs/volumeID/sts, we can make:

/run/ossfs/
└── volumeID
    ├── sts -> sts.20260101000000
    └── sts.20260101000000
        ├── ak
        ├── expiration
        ├── sk
        └── token

i.e., we don't need a symlink for every file. kubelet did that because workload expect the files exist at the root of the volume, while we can make an extra level of dir in the volume.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, with a subdirectory passthrough to clients instead.

// Clean up old data directory if it exists
// Only remove the token files we know about, not the entire directory
if currentDataDir != "" && currentDataDir != tmpDataDir {
// Remove old data directory asynchronously to avoid blocking
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not nessesary IMO. cleanup should be blocking, as we don't want to leak credentials on disk.

And I think we should keep currentDataDir for a while? Because ossfs may still holding a fd to the old dir and trying to read from it. Maybe holding two most recent dirs, and remove the first one before creating the third one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the last-updated accesskeyid to solve this issue, instead of holding the former dir.

Simplify rotateTokenFiles to use single directory-level symlink instead of
two-layer approach. Use tokenDir/sts path to avoid first-call issues.
Add cleanup helpers and defer guards for resource cleanup.
Reorder token keys so AccessKeyId is rotated last, ensuring ossfs/ossfs2
clients see either all old or all new files, never a mixed state.
currentDataDir := ""
if linkTarget, readErr := os.Readlink(dataLinkPath); readErr == nil {
currentDataDir = filepath.Join(dir, linkTarget)
if linkTarget, readErr := os.Readlink(dir); readErr == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is readErr, we must know what error it is, even if it doesn't affect process. please add log for record

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this whole function, it assumes that the dir is not exists (the first call for mounting), or a symlink (the sequential calls for rotating). So here I mute the error logs, and do not consider the regular directory case for another comment u left below.
The latest commit I return error for these corner cases, and solve the rename issue also. (But this will not happen I guess.)

@AlbeeSo AlbeeSo force-pushed the oss/token-rotation-rewrite branch from 937e610 to 12e150e Compare January 7, 2026 02:45
Return error when rotateTokenFiles encounters a regular directory
instead of symlink, as rename would break open file handles.
Also reject rotation on Readlink failures (e.g., permission denied).
@AlbeeSo AlbeeSo force-pushed the oss/token-rotation-rewrite branch from 12e150e to 6049442 Compare January 7, 2026 02:48
@AlbeeSo AlbeeSo requested a review from mowangdk January 7, 2026 02:53
@mowangdk
Copy link
Contributor

mowangdk commented Jan 8, 2026

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 8, 2026
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AlbeeSo, mowangdk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 964cc5a into kubernetes-sigs:master Jan 8, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants