Skip to content

PodMounter: Add Mountpoint Pod Sharing Support#439

Merged
yerzhan7 merged 28 commits into
awslabs:mainfrom
yerzhan7:mppodsharing
May 6, 2025
Merged

PodMounter: Add Mountpoint Pod Sharing Support#439
yerzhan7 merged 28 commits into
awslabs:mainfrom
yerzhan7:mppodsharing

Conversation

@yerzhan7
Copy link
Copy Markdown
Contributor

@yerzhan7 yerzhan7 commented Apr 23, 2025

Issue: #353

Description of changes: This PR adds support for sharing Mountpoint Pods between multiple workload pods when they have compatible configurations. This optimization reduces resource usage by reusing existing Mountpoint Pods instead of creating a new one for each workload pod.

Key Changes:

  • Custom Resource Definition

    • Added MountpointS3PodAttachment CRD to track workload-to-Mountpoint Pod attachments
  • Controller Changes

    • Added reconciler logic to create/manage MountpointS3PodAttachment resources
    • Added stale attachment cleaner to handle orphaned attachments
    • Implemented expectations pattern to handle eventual consistency
    • Added cleanup of resources when pods are terminated
  • Node Changes

    • Modified PodMounter to use shared source mount points and perform bind mounts on target path
    • Added dedicated PodUnmounter to handle clean unmounting
    • Added needs-unmount annotation handling for graceful pod cleanup
    • Added locking mechanisms to prevent race conditions
  • Testing

    • Added e2e and controller tests covering various pod sharing scenarios
    • Manual testing performed on personal EKS cluster

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Comment thread charts/aws-mountpoint-s3-csi-driver/templates/serviceaccount-csi-node.yaml Outdated
Comment thread Makefile Outdated
Comment thread cmd/aws-s3-csi-controller/csicontroller/expectations.go Outdated
Comment thread cmd/aws-s3-csi-controller/csicontroller/reconciler.go
Comment thread pkg/driver/node/mounter/pod_mounter.go Outdated
Comment thread pkg/driver/node/mounter/pod_mounter.go Outdated
Comment thread pkg/driver/node/mounter/pod_mounter.go
Comment thread pkg/driver/node/mounter/pod_unmounter.go Outdated
Comment thread pkg/driver/node/mounter/pod_unmounter.go Outdated
@anurag4DSB
Copy link
Copy Markdown

anurag4DSB commented Apr 30, 2025

General question: What is the motivation for transitioning from a systemd-based mounter to a pod-based mounter?
Is the primary reason scalability, or are there other factors driving this change?

@unexge
Copy link
Copy Markdown
Contributor

unexge commented Apr 30, 2025

Hey @anurag4DSB, see #279 and the linked issues for more details on the motivation why we're moving away from systemd mounter. This PR specifically makes progress towards #353.

@anurag4DSB
Copy link
Copy Markdown

Hey @anurag4DSB, see #279 and the linked issues for more details on the motivation why we're moving away from systemd mounter. This PR specifically makes progress towards #353.

Thank you so much. Appreciate the prompt response.

unexge
unexge previously approved these changes May 2, 2025
Copy link
Copy Markdown
Contributor

@unexge unexge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great work @yerzhan7! Left some comments, but feel free to address them as follow-up PRs. I think we can merge this PR as-is.

Comment thread Makefile
var sb strings.Builder
for _, k := range keys {
sb.WriteString(k)
sb.WriteString("=")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You can also use WriteRune for single characters but perf wouldn't matter much here probably. So it's fine to keep it as-is as well

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

crdv1beta.FieldAuthenticationSource: authSource,
}

if authSource == "pod" {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use the constant credentialprovider.AuthenticationSourcePod here? We should probably move these constants to a different package than node/... as now they're also used in controller environment as well but we can do it later

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Will address moving to different package separately.

volumeAttributes := mppod.ExtractVolumeAttributes(pv)
authSource := volumeAttributes[volumecontext.AuthenticationSource]
if authSource == "" {
return "driver"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use credentialprovider.AuthenticationSourceDriver here as well? I feel like defaulting to driver logic should be kept in either credentialprovider or volumecontext package maybe, but we can also address it later

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this logic is duplicated, will address it separately.

// handleInactivePod handles inactive workload pod.
func (r *Reconciler) handleInactivePod(ctx context.Context, s3pa *crdv1beta.MountpointS3PodAttachment, workloadUID string, log logr.Logger) (bool, error) {
if s3pa == nil {
log.Info("Workload pod is not active. Did not find any MountpointS3PodAttachments.")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check if we have a pending expectation here? And if so we should requeue it to ensure it would properly clean up once the expectation is satisfied?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add pending expectation check, but most likely if we requeue inactive pod event next time reconciler would just skip it completely because by that time Workload Pod won't be found. I.e. cleanup will be handled by periodic stale attachment cleanup job.

Since handleInactivePod() is "best effort" anyway at the moment I don't think we need this extra check.

Maybe even we don't need handleInactivePod() method at all, and we should rely fully on stale attachment cleaner instead.

Comment thread pkg/driver/node/mounter/pod_mounter.go
Comment thread pkg/driver/node/mounter/pod_unmounter.go
volumeId := mpPod.Labels[mppod.LabelVolumeId]

if err := u.writeExitFile(podPath); err != nil {
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to log errors here, and also maybe move this logic to a separate function as it's probably the same as what unmount does

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in follow-up #453

Comment thread pkg/driver/node/mounter/pod_unmounter.go Outdated
}

if mpPod == nil {
klog.V(4).Infof("Mountpoint Pod not found for UID %s, will unmount and delete folder", mpPodUID)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if eventual consistency could cause problems here? Maybe we should keep a counter of not founds and delete it only after N times?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should not cause an issue because source Mountpoint is created only during NPV call which waits for for that Pod to be available in local cache. I.e. if the source folder exists (and we do not need to unmount) MP Pod should be in cache already.

Plus, we also wait for full Pod cache sync during driver start up before starting periodic cleanup job.

Copy link
Copy Markdown
Contributor Author

@yerzhan7 yerzhan7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing! Fixed merge conflicts and resolved small/quick comments in eeb1699

Will address other comments in separate PRs.

Comment thread pkg/driver/node/mounter/pod_unmounter.go
Comment thread pkg/driver/node/mounter/pod_unmounter.go Outdated
}

if mpPod == nil {
klog.V(4).Infof("Mountpoint Pod not found for UID %s, will unmount and delete folder", mpPodUID)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should not cause an issue because source Mountpoint is created only during NPV call which waits for for that Pod to be available in local cache. I.e. if the source folder exists (and we do not need to unmount) MP Pod should be in cache already.

Plus, we also wait for full Pod cache sync during driver start up before starting periodic cleanup job.

Comment thread cmd/aws-s3-csi-controller/csicontroller/stale_attachment_cleaner.go Outdated
Comment thread pkg/cluster/cluster.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants