Skip to content

Commit 7101b10

Browse files
committed
add enhancement for vSphere configurable max number of volumes per node
1 parent 4122b6b commit 7101b10

File tree

1 file changed

+263
-0
lines changed

1 file changed

+263
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,263 @@
1+
---
2+
title: vsphere-configurable-maximum-allowed-number-of-block-volumes-per-node
3+
authors:
4+
- "@rbednar"
5+
reviewers:
6+
- "@jsafrane"
7+
- "@gnufied"
8+
- "@deads2k"
9+
approvers:
10+
- "@jsafrane"
11+
- "@gnufied"
12+
- "@deads2k"
13+
api-approvers:
14+
- "@deads2k"
15+
creation-date: 2025-01-31
16+
last-updated: 2025-01-31
17+
tracking-link:
18+
- https://issues.redhat.com/browse/OCPSTRAT-1829
19+
see-also:
20+
- "None"
21+
replaces:
22+
- "None"
23+
superseded-by:
24+
- "None"
25+
---
26+
27+
# vSphere configurable maximum allowed number of block volumes per node
28+
29+
This document proposes an enhancement to the vSphere CSI driver to allow administrators to configure the maximum number
30+
of block volumes that can be attached to a single vSphere node. This enhancement addresses the limitations of the
31+
current driver, which currently relies on a static limit that can not be changed by cluster administrators.
32+
33+
## Summary
34+
35+
The vSphere CSI driver for vSphere version 7 uses a constant to determine the maximum number of block volumes that can
36+
be attached to a single node. This limit is influenced by the number of SCSI controllers available on the node.
37+
By default, a node can have up to four SCSI controllers, each supporting up to 15 devices, allowing for a maximum of 60
38+
volumes per node (59 + root volume).
39+
40+
However, vSphere version 8 increased the maximum number of volumes per node to 256 (255 + root volume). This enhancement
41+
aims to leverage this increased limit and provide administrators with finer-grained control over volume allocation
42+
allowing them to configure the maximum number of block volumes that can be attached to a single node.
43+
44+
Details about configuration maximums: https://configmax.broadcom.com/guest?vmwareproduct=vSphere&release=vSphere%208.0&categories=3-0
45+
Volume limit configuration for vSphere storage plug-in: https://techdocs.broadcom.com/us/en/vmware-cis/vsphere/container-storage-plugin/3-0/getting-started-with-vmware-vsphere-container-storage-plug-in-3-0/vsphere-container-storage-plug-in-concepts/configuration-maximums-for-vsphere-container-storage-plug-in.html
46+
Knowledge base article with node requirements: https://knowledge.broadcom.com/external/article/301189/prerequisites-and-limitations-when-using.html
47+
48+
## Motivation
49+
50+
### User Stories
51+
52+
- As a vSphere administrator, I want to configure the maximum number of volumes that can be attached to a node, so that
53+
I can optimize resource utilization and prevent oversubscription.
54+
- As a cluster administrator, I want to ensure that the vSphere CSI driver operates within the limits imposed by the
55+
underlying vSphere infrastructure.
56+
57+
### Goals
58+
59+
- Provide administrators with control over volume allocation limit on vSphere nodes.
60+
- Improve resource utilization and prevent oversubscription.
61+
- Ensure compatibility with existing vSphere infrastructure limitations.
62+
- Maintain backward compatibility with existing deployments.
63+
64+
### Non-Goals
65+
66+
- Support heterogeneous environments with different ESXi versions on nodes that form OpenShift cluster.
67+
- Dynamically adjust the limit based on real-time resource usage.
68+
- Implement per-namespace or per-workload volume limits.
69+
- Modify the underlying vSphere VM configuration.
70+
71+
## Proposal
72+
73+
1. Driver Feature State Switch (FSS):
74+
75+
- Use FSS (`max-pvscsi-targets-per-vm`) of the vSphere driver to control the activation of the maximum volume limit
76+
functionality.
77+
- No changes needed, the feature is enabled by default.
78+
79+
2. API for Maximum Volume Limit:
80+
81+
- Introduce a new field `spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode` in ClusterCSIDriver API to allow
82+
administrators to configure the desired maximum number of volumes per node.
83+
- This field should not have a default value and the actual default will be set by the operator to the current
84+
maximum limit of 59 volumes per node which matches limit for vSphere 7.
85+
- API will not allow `0` value to be set or allow the field to be unset. This would lead to
86+
disabling the limit.
87+
- Allowed range of values should be 1 to 255. The maximum value matches vSphere 8 limit.
88+
89+
3. Update CSI Pods with hooks:
90+
91+
- After reading the new `maxAllowedBlockVolumesPerNode` API field from ClusterCSIDriver the operator will inject the
92+
`MAX_VOLUMES_PER_NODE` environment variable into all pods using a DaemonSet and Deployment hooks.
93+
- Any value that is statically set for the `MAX_VOLUMES_PER_NODE` environment variable in asset files will
94+
be overwritten. If the variable is omitted in the asset, hooks will add it and set it its value found in
95+
`maxAllowedBlockVolumesPerNode` field of ClusterCSIDriver. If the field is not set, the default value will be 59
96+
to match vSphere 7 limit.
97+
98+
4. Operator behavior:
99+
100+
- The operator will check ESXi versions on all nodes in the cluster. Setting `maxAllowedBlockVolumesPerNode` to a
101+
higher value than 59 while not having ESXi version 8 or higher on all nodes will result in cluster degradation.
102+
103+
5. Driver Behavior:
104+
105+
- The vSphere CSI driver needs to allow the higher limit with Feature State Switch FSS (`max-pvscsi-targets-per-vm`).
106+
- The switch is already enabled by default in versions shipped in OpenShift 4.19.
107+
- The driver will report the volume limit as usual in response to `NodeGetInfo` calls.
108+
109+
6. Documentation:
110+
111+
- Update the vSphere CSI driver documentation to include information about the new feature and how to configure it.
112+
However, at the time of writing we don't have any official vSphere documentation to refer to that would explain how
113+
to configure vSphere to support 256 volumes per node.
114+
- Include a statement informing users of the current requirement of having a homogeneous cluster with all nodes
115+
running ESXi 8 or higher. Until this requirement is met, the limit set in `maxAllowedBlockVolumesPerNode` must not
116+
be increased to a higher value than 59. If higher value is set regardless of this requirement the cluster will
117+
degrade.
118+
- Currently, there is no Distributed Resource scheduler (DRS) validation in place in the vSphere to make sure we do
119+
not end up having more VMs with 256 disks on the same host so users might exceed the limit of 2048 Virtual Disks
120+
per Host. This is a known limitation of vSphere, and we need note this in documentation to make users aware of this
121+
potential risk.
122+
123+
### Workflow Description
124+
125+
1. Administrator configures the limit:
126+
- The administrator creates or updates a ClusterCSIDriver object to specify the desired maximum number of volumes per
127+
node using the new `maxAllowedBlockVolumesPerNode` API field.
128+
2. Operator reads configuration:
129+
- The vSphere CSI Operator monitors the ClusterCSIDriver object for changes.
130+
- Upon detecting a change, the operator reads the configured limit value.
131+
3. Operator updates the new limit for DaemonSet and Deployment:
132+
- The operator updates the pods of vSphere CSI driver, injecting the `MAX_VOLUMES_PER_NODE` environment
133+
variable with the configured limit value into the driver node pods on worker nodes.
134+
135+
### API Extensions
136+
137+
- New field in ClusterCSIDriver CRD:
138+
- A new CRD field will be introduced to represent the maximum volume limit configuration.
139+
- This CRD will contain a single new field (e.g., `spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode`) to define
140+
the desired limit.
141+
- The API will validate the value fits within the defined range (1-255).
142+
143+
### Topology Considerations
144+
145+
#### Hypershift / Hosted Control Planes
146+
147+
No unique considerations for Hypershift. The configuration and behavior of the vSphere CSI driver with respect to the
148+
maximum volume limit will remain consistent across standalone and managed clusters.
149+
150+
#### Standalone Clusters
151+
152+
This enhancement is fully applicable to standalone OpenShift clusters.
153+
154+
#### Single-node Deployments or MicroShift
155+
156+
No unique considerations for MicroShift. The configuration and behavior of the vSphere CSI driver with respect to the
157+
maximum volume limit will remain consistent across standalone and SNO/MicroShift clusters.
158+
159+
### Implementation Details/Notes/Constraints
160+
161+
One of the possible future constraints might be increasing the limit with newer vSphere versions. However, we expect the
162+
limit to be increasing rather than decreasing and making the API validation more relaxed is possible.
163+
164+
### Risks and Mitigations
165+
166+
- None.
167+
168+
### Drawbacks
169+
170+
- Increased Complexity: Introducing a new CRD field and operator logic adds complexity to the vSphere CSI driver ecosystem.
171+
- Missing vSphere documentation: At the time of writing we don't have a clear statement or documentation to refer to
172+
that would well describe all the necessary details and limitations of this feature. See Documentation in
173+
the Proposal section for details.
174+
- Limited Granularity: The current proposal provides a global node-level limit. More fine-grained control
175+
(e.g., per-namespace or per-workload limits) would require further investigation and development.
176+
177+
## Open Questions [optional]
178+
179+
None.
180+
181+
## Test Plan
182+
183+
- E2E tests will be implemented to verify the correct propagation of the configured limit to the driver pods.
184+
These tests will be executed only on vSphere 8.
185+
186+
## Graduation Criteria
187+
188+
- TechPreview in 4.19.
189+
190+
### Dev Preview -> Tech Preview
191+
192+
- No Dev Preview phase.
193+
194+
### Tech Preview -> GA
195+
196+
- E2E test coverage demonstrating stability.
197+
- Available by default.
198+
- User facing documentation created in [openshift-docs](https://github.com/openshift/openshift-docs/).
199+
- We have to wait for VMware to GA this feature and document the configuration on vCenter side.
200+
201+
### Removing a deprecated feature
202+
203+
- No.
204+
205+
## Upgrade / Downgrade Strategy
206+
207+
- **Upgrades:** During an upgrade, the operator will apply the new API field value and update the driver pods with
208+
the new `MAX_VOLUMES_PER_NODE` value. If the field is not set, default value (59) is used to match the current limit
209+
for vSphere 7. So the limit will not change for existing deployments unless the user explicitly sets it.
210+
- **Downgrades:** Downgrading to a version without this feature will result in the API field being ignored and the
211+
operator will revert to its previous hardcoded value configured in DaemonSet (59). If there is a higher count of
212+
attached volumes than the limit after downgrade, the vSphere CSI driver will not be able to attach new volumes to
213+
nodes and users will need to manually detach the extra volumes.
214+
215+
## Version Skew Strategy
216+
217+
There are no version skew concerns for this enhancement.
218+
219+
## Operational Aspects of API Extensions
220+
221+
- API extension does not pose any operational challenges.
222+
223+
## Support Procedures
224+
225+
* To check the status of the vSphere CSI operator, use the following command:
226+
`oc get deployments -n openshift-cluster-csi-drivers`. Ensure that the operator is running and healthy, inspect logs.
227+
* To inspect the `ClusterCSIDriver` CRs, use the following command: `oc get clustercsidriver/csi.vsphere.vmware.com -o yaml`.
228+
Examine the `spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode` field.
229+
230+
## Alternatives
231+
232+
We considered several approaches to handle environments with mixed ESXi versions:
233+
234+
1. **Cluster Degradation (Selected Approach)**:
235+
- We will degrade the cluster if the user-specified limit exceeds what's supported by the underlying infrastructure.
236+
- This requires checking the ClusterCSIDriver configuration against actual node capabilities in the `check_nodes.go` implementation.
237+
- The error messages will be specific about the incompatibility.
238+
- Documentation will clearly state that increased limits are not supported on environments containing ESXi 7.x hosts.
239+
240+
2. **Warning-Only Approach**:
241+
- Allow any user-specified limit (up to 255) regardless of ESXi versions in the cluster.
242+
- Emit metrics and alerts when incompatible configurations are detected.
243+
- This approach would result in application pods getting stuck in ContainerCreating state when scheduled to ESXi 7.0 nodes that exceed the 59 attachment limit.
244+
- This option was rejected as it would lead to poor user experience with difficult-to-diagnose failures.
245+
246+
3. **Dynamic Limit Adjustment**:
247+
- Have the DaemonSet controller ignore user-specified limits that exceed cluster capabilities and automatically switch to a supportable limit.
248+
- This option is technically complex as it would require:
249+
- Delaying CSI driver startup until all version checks complete
250+
- Implementing a DaemonSet hook to perform full cluster scans for ESXi versions (expensive operation)
251+
- Duplicating node checks already being performed elsewhere
252+
- This approach was rejected due to implementation complexity.
253+
254+
4. **Driver-Level Detection**:
255+
- Add code to the DaemonSet pod that would detect limits from BIOS or OS and consider that when reporting attachment capabilities.
256+
- This would require modifications to the driver code itself, which would be better implemented by VMware.
257+
- This approach was rejected as it would depend on upstream changes that historically have been slow to implement.
258+
259+
## Infrastructure Needed [optional]
260+
261+
- Current infrastructure needed to support the enhancement is available for testing vSphere version 8.
262+
- To test the feature we need to a nested vSphere environment and set `pvscsiCtrlr256DiskSupportEnabled` in
263+
vCenter config to allow the higher volume attachment limit.

0 commit comments

Comments
 (0)