Environment
- vSphere CSI Driver version: v3.6.0
- Kubernetes distribution: k3s
- k3s version: 1.33
- vSphere version: 8.0.3.00700
- Cluster type: Vanilla Kubernetes (k3s)
- Networking: vCenter is reachable only through an HTTP/HTTPS proxy
- Proxy configuration: Injected via pod environment variables (HTTP_PROXY, HTTPS_PROXY, NO_PROXY) in vsphere-csi-controller
Summary
When the outbound proxy becomes temporarily unavailable, the vSphere CSI controller begins receiving 503 Service Unavailable errors when communicating with vCenter.
This is expected during a network disruption.
The problem is that after the proxy becomes available again, the CSI driver does not recover properly.
Although the controller successfully creates a new vCenter session, all subsequent CSI operations fail with:
- ServerFaultCode: The session is not authenticated
- 401 Unauthorized
- ListView destruction failure due to invalid session
- CNS QueryAllVolume failing
- Property collector WaitForUpdates failing
The system remains in this broken state until the vsphere-csi-controller pod is manually restarted.
Steps to Reproduce
- Deploy vSphere CSI on a k3s cluster where vsphere-csi-controller uses an outbound proxy to reach vCenter.
- Temporarily disrupt or block the proxy (simulate network drop or forced outage).
- Observe repeated CSI errors such as:
- Post https:///sdk: Service Unavailable
- Failures retrieving datacenters and datastore maps
- Restore proxy connectivity.
- Observe that CSI:
- Attempts to reconnect
- Logs creation of a new vCenter session
- Despite the successful reconnection, CSI continues to fail with authentication errors indefinitely.
Expected Behavior
After the proxy comes back online, the CSI driver should:
- Successfully authenticate with vCenter
- Refresh internal state (sessions, listviews, property collectors, tagging clients, etc.)
- Resume normal operation without requiring a pod restart
Actual Behavior
- Proxy outage triggers 503 Service Unavailable failures — expected.
- CSI fails to properly clean up existing sessions (Logout also fails because proxy is down).
- CSI eventually creates a new session:
New session ID = <id>
VirtualCenter.connect() successfully created new client
-
Immediately after that, every operation from CSI fails with:
ServerFaultCode: The session is not authenticated
or
401 Unauthorized
-
Even ListView teardown and recreation fails intermittently:
failed to destroy listview object. err: The session is not authenticated
- The controller never recovers until manually restarted.
logs attached below
vsphere-csi-controller-6d5486f67c-59x78_vsphere-csi-controller.log
Environment
Summary
When the outbound proxy becomes temporarily unavailable, the vSphere CSI controller begins receiving 503 Service Unavailable errors when communicating with vCenter.
This is expected during a network disruption.
The problem is that after the proxy becomes available again, the CSI driver does not recover properly.
Although the controller successfully creates a new vCenter session, all subsequent CSI operations fail with:
The system remains in this broken state until the vsphere-csi-controller pod is manually restarted.
Steps to Reproduce
Expected Behavior
After the proxy comes back online, the CSI driver should:
Actual Behavior
Immediately after that, every operation from CSI fails with:
ServerFaultCode: The session is not authenticatedor
401 UnauthorizedEven ListView teardown and recreation fails intermittently:
failed to destroy listview object. err: The session is not authenticatedlogs attached below
vsphere-csi-controller-6d5486f67c-59x78_vsphere-csi-controller.log