Skip to content

Commit 78b4a2a

Browse files
committed
add enhancement for vSphere configurable max number of volumes per node
1 parent 8d475da commit 78b4a2a

File tree

1 file changed

+200
-0
lines changed

1 file changed

+200
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,200 @@
1+
---
2+
title: vsphere-configurable-maximum-allowed-number-of-block-volumes-per-node
3+
authors:
4+
- "@rbednar"
5+
reviewers:
6+
- "@jsafrane"
7+
- "@gnufied"
8+
- "@deads2k"
9+
approvers:
10+
- "@jsafrane"
11+
- "@gnufied"
12+
- "@deads2k"
13+
api-approvers:
14+
- "@deads2k"
15+
creation-date: 2025-01-31
16+
last-updated: 2025-01-31
17+
tracking-link:
18+
- https://issues.redhat.com/browse/OCPSTRAT-1829
19+
see-also:
20+
- "None"
21+
replaces:
22+
- "None"
23+
superseded-by:
24+
- "None"
25+
---
26+
27+
# vSphere configurable maximum allowed number of block volumes per node
28+
29+
This document proposes an enhancement to the vSphere CSI driver to allow administrators to configure the maximum number
30+
of block volumes that can be attached to a single vSphere node. This enhancement addresses the limitations of the current driver,
31+
which relies on a static limit based on the number of SCSI controllers available on the vSphere node.
32+
33+
## Summary
34+
35+
The vSphere CSI driver for vSphere version 7 uses a constant to determine the maximum number of block volumes that can
36+
be attached to a single node. This limit is influenced by the number of SCSI controllers available on the node.
37+
By default, a node can have up to four SCSI controllers, each supporting up to 15 devices, allowing for a maximum of 60
38+
volumes per node (59 + root volume).
39+
40+
However, vSphere version 8 increased the maximum number of volumes per node to 256 (255 + root volume). This enhancement
41+
aims to leverage this increased limit and provide administrators with finer-grained control over volume allocation
42+
allowing them to configure the maximum number of block volumes that can be attached to a single node.
43+
44+
Details about configuration maximums: https://configmax.broadcom.com/guest?vmwareproduct=vSphere&release=vSphere%208.0&categories=3-0
45+
Volume limit configuration for vSphere storage plug-in: https://techdocs.broadcom.com/us/en/vmware-cis/vsphere/container-storage-plugin/3-0/getting-started-with-vmware-vsphere-container-storage-plug-in-3-0/vsphere-container-storage-plug-in-concepts/configuration-maximums-for-vsphere-container-storage-plug-in.html
46+
47+
## Motivation
48+
49+
### User Stories
50+
51+
- As a vSphere administrator, I want to configure the maximum number of volumes that can be attached to a node, so that
52+
I can optimize resource utilization and prevent oversubscription.
53+
- As a cluster administrator, I want to ensure that the vSphere CSI driver operates within the limits imposed by the
54+
underlying vSphere infrastructure.
55+
56+
### Goals
57+
58+
- Provide administrators with granular control over volume allocation on vSphere nodes.
59+
- Improve resource utilization and prevent oversubscription.
60+
- Ensure compatibility with existing vSphere infrastructure limitations.
61+
- Maintain backward compatibility with existing deployments.
62+
63+
### Non-Goals
64+
65+
- Dynamically adjust the limit based on real-time resource usage.
66+
- Implement per-namespace or per-workload volume limits.
67+
- Modify the underlying vSphere VM configuration.
68+
69+
## Proposal
70+
71+
1. Enable Feature State Switch (FSS):
72+
73+
- Use FSS of the vSphere driver to control the activation of the maximum volume limit functionality.
74+
- The operator will check for vSphere version 8 (`VCenterChecker`) and conditionally set higher volume limit if version 8 or higher is detected.
75+
76+
2. API for Maximum Volume Limit:
77+
78+
- Introduce a new field `spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode` in ClusterCSIDriver API to allow administrators to configure the desired maximum number of volumes per node.
79+
- The vSphere CSI operator will read the configured value from the API.
80+
81+
3. Update CSI Node Pods:
82+
83+
- If the new `maxAllowedBlockVolumesPerNode` API field is set in ClusterCSIDriver the operator will inject the `MAX_VOLUMES_PER_NODE` environment variable into node pods using a DaemonSet hook.
84+
85+
4. Driver Behavior:
86+
87+
- The vSphere CSI driver will continue to perform basic validation on the user-defined limit, allowing the new limit of 255 volumes per node only on vSphere versions higher than 8.
88+
- The driver will respect the configured limit when provisioning volumes.
89+
90+
### Workflow Description
91+
92+
1. Administrator Configures Limit:
93+
- The administrator creates or updates a ClusterCSIDriver object to specify the desired maximum number of volumes per node.
94+
2. Operator Reads Configuration:
95+
- The vSphere CSI Operator monitors the configuration object for changes.
96+
- Upon detecting a change, the operator reads the configured limit value.
97+
3. Operator Updates sets the new volume limit for DaemonSet:
98+
- The operator updates the DaemonSet for the vSphere CSI driver, injecting the `MAX_VOLUMES_PER_NODE` environment variable with the configured limit value into the driver node pods.
99+
4. Driver Enforces Limit:
100+
- The vSphere CSI driver reads the `MAX_VOLUMES_PER_NODE` environment variable and uses the configured limit during volume provisioning requests.
101+
102+
### API Extensions
103+
104+
- New field in ClusterCSIDriver CRD:
105+
- A new CRD field will be introduced to represent the maximum volume limit configuration.
106+
- This CRD will contain a single new field (e.g., `spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode`) to define the desired limit.
107+
- The CRD should be designed with appropriate validation rules to ensure valid values are provided.
108+
109+
### Topology Considerations
110+
111+
#### Hypershift / Hosted Control Planes
112+
113+
No unique considerations for Hypershift. The configuration and behavior of the vSphere CSI driver with respect to the
114+
maximum volume limit will remain consistent across standalone and managed clusters.
115+
116+
#### Standalone Clusters
117+
118+
This enhancement is fully applicable to standalone OpenShift clusters.
119+
120+
#### Single-node Deployments or MicroShift
121+
122+
No unique considerations for MicroShift. The configuration and behavior of the vSphere CSI driver with respect to the
123+
maximum volume limit will remain consistent across standalone and SNO/MicroShift clusters.
124+
125+
### Implementation Details/Notes/Constraints
126+
127+
One of the possible future constraints might be increasing the limit with newer vSphere versions. However, we expect the
128+
limit to be increasing rather than decreasing and making the API validation more relaxed is possible.
129+
130+
### Risks and Mitigations
131+
132+
- Possible risk of disabling CSI controller volume publish capability: The new field in ClusterCSIDriver for setting
133+
limits should default to higher value than 0 (59 is reasonable, to match vSphere version 7 limit)
134+
- Impact on existing deployments: The default limit remains unchanged, minimizing disruption for existing deployments.
135+
136+
### Drawbacks
137+
138+
- Increased Complexity: Introducing a new CRD and operator logic adds complexity to the vSphere CSI driver ecosystem.
139+
- Potential for Configuration Errors: Incorrectly configuring the maximum volume limit can lead to unexpected behavior or resource limitations.
140+
- Limited Granularity: The current proposal provides a node-level limit. More fine-grained control (e.g., per-namespace or per-workload limits) would require further investigation and development.
141+
142+
## Open Questions [optional]
143+
144+
None.
145+
146+
## Test Plan
147+
148+
- E2E tests will be implemented to verify the correct propagation of the configured limit to the driver pods. These tests will only run on vSphere 8.
149+
150+
## Graduation Criteria
151+
152+
- GA in 4.19.
153+
- E2E tests are implemented and passing.
154+
- Documentation is updated.
155+
156+
### Dev Preview -> Tech Preview
157+
158+
- Ability to utilize the enhancement end to end
159+
160+
### Tech Preview -> GA
161+
162+
- E2E test coverage demonstrating stability.
163+
- Available by default.
164+
- User facing documentation created in [openshift-docs](https://github.com/openshift/openshift-docs/).
165+
166+
### Removing a deprecated feature
167+
168+
- No.
169+
170+
## Upgrade / Downgrade Strategy
171+
172+
- **Upgrades:** During an upgrade, the operator will apply the new API field value and update the driver DaemonSet with
173+
the new `MAX_VOLUMES_PER_NODE` value if it's configured. If the new field is not configured, the operator will
174+
keep using its previous hardcoded value configured in DaemonSet (59).
175+
-**Downgrades:** Downgrading to a version without this feature will result in the API field being ignored and the
176+
operator will revert to its previous hardcoded value configured in DaemonSet (59). If there is a higher count of
177+
attached volumes that the limit after downgrade, the vSphere CSI driver will not be able to attach new volumes and
178+
users will need to manually detach the extra volumes.
179+
180+
## Version Skew Strategy
181+
182+
There are no version skew concerns for this enhancement.
183+
184+
## Operational Aspects of API Extensions
185+
186+
- API extension does not pose any operational challenges.
187+
188+
## Support Procedures
189+
190+
* To check the status of the vSphere CSI operator, use the following command: `oc get deployments -n openshift-cluster-csi-drivers`. Ensure that the operator is running and healthy, inspect logs.
191+
* To inspect the `ClusterCSIDriver` CRs, use the following command: `oc get clustercsidriver/<driver_name> - yaml`. Examine the `spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode` field.
192+
193+
## Alternatives
194+
195+
- We could conditionally set FSS with the operator based either on presence of the new field or feature gate enablement in OpenShift.
196+
This should not be necessary as the FSS in the driver only allows setting higher volume limit (255) per node.
197+
198+
## Infrastructure Needed [optional]
199+
200+
- Current infrastructure needed to support the enhancement is available for testing vSphere version 8.

0 commit comments

Comments
 (0)