fix: detect and remount stale NFS mounts in NodePublishVolume#1108
fix: detect and remount stale NFS mounts in NodePublishVolume#1108andyzhangx merged 2 commits intokubernetes-csi:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andyzhangx The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
cac3012 to
7f2a9ee
Compare
When an NFS server restarts, existing mounts become stale. If a pod with fsGroup is restarted, kubelet calls applyFSGroup which does lstat on the mount path and fails with ESTALE before CSI driver gets a chance to remount. Fix by checking for stale file handles when the target path is already mounted. If detected, unmount the stale mount and proceed with a fresh mount. Ref 927
7f2a9ee to
3e7f21b
Compare
There was a problem hiding this comment.
Pull request overview
Fixes kubelet applyFSGroup failures after NFS server restarts by detecting stale NFS mounts in NodePublishVolume and forcing an unmount/remount when a stale file handle is detected.
Changes:
- Add an
os.Lstatcheck on already-mounted target paths to detect stale NFS handles. - On
ESTALE, unmount the target and proceed with a fresh mount. - Introduce a helper (
isStaleFileHandle) to classify stale-handle errors.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Make os.Lstat injectable via lstatFunc package variable for testability - Add NodePublishVolume test case covering ESTALE detection -> unmount -> remount - Add TestIsStaleFileHandle with coverage for ESTALE errno, wrapped errors, string matching, and negative cases
|
Addressed the review comment — added unit tests for the stale mount detection path in 3ce23a8:
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
pkg/nfs/nodeserver_test.go:33
- This test now imports syscall and uses syscall.ESTALE/ENOENT. Those constants are not available on Windows, so pkg/nfs unit tests may stop compiling on GOOS=windows. If Windows builds are still supported, gate these tests behind build tags (e.g. !windows) or avoid OS-specific errno constants in the test inputs.
import (
"context"
"errors"
"fmt"
"os"
"reflect"
"strings"
"syscall"
"testing"
"github.com/container-storage-interface/spec/lib/go/csi"
"github.com/kubernetes-csi/csi-driver-nfs/test/utils/testutil"
"github.com/stretchr/testify/assert"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/cherrypick release-4.13 |
|
@andyzhangx: new pull request created: #1109 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
When an NFS server restarts, existing mounts on nodes become stale. If a pod with
fsGroupis then restarted, kubelet'sapplyFSGroupcallslstaton the mount path and fails withESTALE(stale NFS file handle) before the CSI driver gets a chance to remount.This PR fixes
NodePublishVolumeto detect stale mounts and automatically unmount + remount them, so that kubelet's subsequentapplyFSGroupsucceeds.How does it work:
os.LstatcheckESTALEis detected, unmount the stale mountWhich issue(s) this PR fixes:
Ref #927
Does this PR introduce a user-facing change?