Description
Nomad version
Output from nomad version
Nomad v1.8.9+ent
BuildDate 2025-01-14T19:11:47Z
Revision fc0f34f5b196ce9fcd0c62b6a5e7ce23934826d4
Operating system and Environment details
Alpine 3.20
Issue
When using a CSI plugin with staging (e.g. CephCSI for CephFS volumes), Nomad client correctly keeps track of which mounted CSI volumes with access mode multi-node-multi-writer
are being used by more than one alloc. When two allocations are using the same volume, the staging mount is shared between them, and when one gets stopped the staging mount is kept as long as the other allocation is running (i.e. only NodeUnpublishVolume
on the volume is called, but not NodeUnstageVolume
).
However, when Nomad client gets restarted, it loses track of which CSI volumes are still being used by other allocations. In the scenario above with two allocations using the same volume, when one alloc gets stopped, Nomad calls NodeUnpublishVolume
followed by NodeUnstageVolume
, even though the volume is still being used by the other allocation.
From the allocation's perspective, the consequences of this apparently depend on what job drivers are being used and how mount propagation is configured on the host. In my environment with Docker driver and parent mount having shared
propagation it took a while to notice this bug. After unmounting the stage mount in the host mount namespace the allocation that was left running still kept the mount in its own mount namespace, and hence nothing really broke from its perspective. When the second allocation was stopped, Nomad client was able to properly unmount the volume. However, I'd expect more disastrous consequences in less favorable environments.
Reproduction steps
- Setup a Nomad cluster with a CSI plugin with staging (e.g. Ceph-CSI)
- Create a CSI volume with
multi-node-multi-writer
access mode (e.g. CephFS volume) - Run two jobs that use the same volume on the same node.
- Restart Nomad client
- Stop one of the jobs, Nomad will call
NodeUnstageVolume
on the volume despite the other job still using the volume.
Expected Result
After restart, Nomad client should continue to keep track which allocations are using the same CSI volumes with staging.
Actual Result
After restart, when there are more than one allocations using the same CSI volume with staging, on stop of one of the allocs Nomad client potentially breaks the volume mount for the other alloc by unmounting the staging path.
Job file (if appropriate)
Nomad Server logs (if appropriate)
Nomad Client logs (if appropriate)
Metadata
Metadata
Assignees
Type
Projects
Status