Skip to content

Dynamic User Home Dirs (Non NFS)#3192

Draft
Adam-D-Lewis wants to merge 4 commits intomainfrom
non-shared-home-dirs
Draft

Dynamic User Home Dirs (Non NFS)#3192
Adam-D-Lewis wants to merge 4 commits intomainfrom
non-shared-home-dirs

Conversation

@Adam-D-Lewis
Copy link
Member

Reference Issues or PRs

What does this implement/fix?

Put a x in the boxes that apply

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds a feature)
  • Breaking change (fix or feature that would cause existing features not to work as expected)
  • Documentation Update
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no API changes)
  • Build related changes
  • Other (please describe):

Documentation

  • For new features or enhancements, a corresponding PR has been opened in the documentation repository (if applicable)
    • Link to docs PR:

Testing

  • Did you test the pull request locally?
  • Did you add new tests?

How to test this PR?

Any other comments?

@Adam-D-Lewis Adam-D-Lewis changed the title not working Dynamic User Home Dirs (Non NFS) Jan 7, 2026
@Adam-D-Lewis
Copy link
Member Author

Adam-D-Lewis commented Jan 8, 2026

I was testing various Nebari functionality on this branch with non NFS user home directories. During testing, both JHub Apps and Jupyter Scheduler / Argo Workflows will need work before they can be considered supported with non-shared home directories.

The Problem

Both JHub Apps and Jupyter Scheduler create secondary pods (app server pods, scheduled job pods) that also mount the user's home directory PVC.

With RWO storage, a PVC can only be mounted by pods on the same node. Sometimes the secondary pods get scheduled on the same node as the user's primary pod, and everything works fine. But if they land on a different node, the secondary pod remains Pending indefinitely, waiting for the PVC.

This behavior is non-deterministic and becomes more likely to fail as the cluster scales (more nodes = lower chance of same-node scheduling).

Proposed Solution: PVC Cloning

The appropriate solution is to use PVC cloning. This does require that PVC cloning is supported by the CSI driver used by the home dir volumes. It is supported by the default CSI drivers on AWS, GCP, and Azure, and could also be supported by Longhorn or Rook/Ceph on K3S.

Under the hood, PVC cloning creates a copy-on-write clone of the volume, so it's fast. Initial testing showed cloning completes in about 30 seconds.

Supporting PVC cloning would require some changes to JHub Apps and also Nebari's Jupyter Scheduler integration, but it's fairly straight forward.

@Adam-D-Lewis
Copy link
Member Author

Adam-D-Lewis commented Jan 9, 2026

Someone brought up the point that cloning the home dir is not super robust b/c the user might have made weird changes to the env after the time that they scheduled the notebook. Also, we may not have all the user's envs available locally anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: New 🚦

Development

Successfully merging this pull request may close these issues.

1 participant