-
Notifications
You must be signed in to change notification settings - Fork 501
STOR-2141: add enhancement for vSphere configurable max number of volumes per node #1748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
262 changes: 262 additions & 0 deletions
262
...torage/vsphere-configurable-maximum-allowed-number-of-block-volumes-per-node.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,262 @@ | ||
--- | ||
title: vsphere-configurable-maximum-allowed-number-of-block-volumes-per-node | ||
authors: | ||
- "@rbednar" | ||
reviewers: | ||
- "@jsafrane" | ||
- "@gnufied" | ||
- "@deads2k" | ||
approvers: | ||
- "@jsafrane" | ||
- "@gnufied" | ||
api-approvers: | ||
- "@JoelSpeed" | ||
creation-date: 2025-01-31 | ||
last-updated: 2025-01-31 | ||
tracking-link: | ||
- https://issues.redhat.com/browse/OCPSTRAT-1829 | ||
see-also: | ||
- "None" | ||
replaces: | ||
- "None" | ||
superseded-by: | ||
- "None" | ||
--- | ||
|
||
# vSphere configurable maximum allowed number of block volumes per node | ||
|
||
This document proposes an enhancement to the vSphere CSI driver to allow administrators to configure the maximum number | ||
of block volumes that can be attached to a single vSphere node. This enhancement addresses the limitations of the | ||
current driver, which currently relies on a static limit that can not be changed by cluster administrators. | ||
|
||
## Summary | ||
|
||
The vSphere CSI driver for vSphere version 7 uses a constant to determine the maximum number of block volumes that can | ||
be attached to a single node. This limit is influenced by the number of SCSI controllers available on the node. | ||
By default, a node can have up to four SCSI controllers, each supporting up to 15 devices, allowing for a maximum of 60 | ||
volumes per node (59 + root volume). | ||
|
||
However, vSphere version 8 increased the maximum number of volumes per node to 256 (255 + root volume). This enhancement | ||
aims to leverage this increased limit and provide administrators with finer-grained control over volume allocation | ||
allowing them to configure the maximum number of block volumes that can be attached to a single node. | ||
|
||
Details about configuration maximums: https://configmax.broadcom.com/guest?vmwareproduct=vSphere&release=vSphere%208.0&categories=3-0 | ||
Volume limit configuration for vSphere storage plug-in: https://techdocs.broadcom.com/us/en/vmware-cis/vsphere/container-storage-plugin/3-0/getting-started-with-vmware-vsphere-container-storage-plug-in-3-0/vsphere-container-storage-plug-in-concepts/configuration-maximums-for-vsphere-container-storage-plug-in.html | ||
Knowledge base article with node requirements: https://knowledge.broadcom.com/external/article/301189/prerequisites-and-limitations-when-using.html | ||
|
||
## Motivation | ||
|
||
### User Stories | ||
|
||
- As a vSphere administrator, I want to configure the maximum number of volumes that can be attached to a node, so that | ||
I can optimize resource utilization and prevent oversubscription. | ||
- As a cluster administrator, I want to ensure that the vSphere CSI driver operates within the limits imposed by the | ||
underlying vSphere infrastructure. | ||
|
||
### Goals | ||
|
||
- Provide administrators with control over volume allocation limit on vSphere nodes. | ||
- Improve resource utilization and prevent oversubscription. | ||
- Ensure compatibility with existing vSphere infrastructure limitations. | ||
- Maintain backward compatibility with existing deployments. | ||
|
||
### Non-Goals | ||
|
||
- Support heterogeneous environments with different ESXi versions on nodes that form OpenShift cluster. | ||
- Dynamically adjust the limit based on real-time resource usage. | ||
- Implement per-namespace or per-workload volume limits. | ||
- Modify the underlying vSphere VM configuration. | ||
|
||
## Proposal | ||
|
||
1. Driver Feature State Switch (FSS): | ||
|
||
- Use FSS (`max-pvscsi-targets-per-vm`) of the vSphere driver to control the activation of the maximum volume limit | ||
functionality. | ||
- No changes needed, the feature is enabled by default. | ||
|
||
2. API for Maximum Volume Limit: | ||
|
||
- Introduce a new field `spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode` in ClusterCSIDriver API to allow | ||
administrators to configure the desired maximum number of volumes per node. | ||
- This field should not have a default value and the actual default will be set by the operator to the current | ||
maximum limit of 59 volumes per node which matches limit for vSphere 7. | ||
- API will not allow `0` value to be set or allow the field to be unset. This would lead to | ||
disabling the limit. | ||
- Allowed range of values should be 1 to 255. The maximum value matches vSphere 8 limit. | ||
|
||
3. Update CSI Pods with hooks: | ||
|
||
- After reading the new `maxAllowedBlockVolumesPerNode` API field from ClusterCSIDriver the operator will inject the | ||
`MAX_VOLUMES_PER_NODE` environment variable into all pods using a DaemonSet and Deployment hooks. | ||
- Any value that is statically set for the `MAX_VOLUMES_PER_NODE` environment variable in asset files will | ||
be overwritten. If the variable is omitted in the asset, hooks will add it and set it its value found in | ||
`maxAllowedBlockVolumesPerNode` field of ClusterCSIDriver. If the field is not set, the default value will be 59 | ||
to match vSphere 7 limit. | ||
|
||
4. Operator behavior: | ||
|
||
- The operator will check ESXi versions on all nodes in the cluster. Setting `maxAllowedBlockVolumesPerNode` to a | ||
higher value than 59 while not having ESXi version 8 or higher on all nodes will result in cluster degradation. | ||
|
||
5. Driver Behavior: | ||
|
||
- The vSphere CSI driver needs to allow the higher limit with Feature State Switch FSS (`max-pvscsi-targets-per-vm`). | ||
- The switch is already enabled by default in versions shipped in OpenShift 4.19. | ||
- The driver will report the volume limit as usual in response to `NodeGetInfo` calls. | ||
|
||
6. Documentation: | ||
|
||
- Update the vSphere CSI driver documentation to include information about the new feature and how to configure it. | ||
However, at the time of writing we don't have any official vSphere documentation to refer to that would explain how | ||
to configure vSphere to support 256 volumes per node. | ||
- Include a statement informing users of the current requirement of having a homogeneous cluster with all nodes | ||
running ESXi 8 or higher. Until this requirement is met, the limit set in `maxAllowedBlockVolumesPerNode` must not | ||
be increased to a higher value than 59. If higher value is set regardless of this requirement the cluster will | ||
degrade. | ||
- Currently, there is no Distributed Resource scheduler (DRS) validation in place in the vSphere to make sure we do | ||
not end up having more VMs with 256 disks on the same host so users might exceed the limit of 2048 Virtual Disks | ||
per Host. This is a known limitation of vSphere, and we need note this in documentation to make users aware of this | ||
potential risk. | ||
|
||
### Workflow Description | ||
|
||
1. Administrator configures the limit: | ||
- The administrator creates or updates a ClusterCSIDriver object to specify the desired maximum number of volumes per | ||
node using the new `maxAllowedBlockVolumesPerNode` API field. | ||
2. Operator reads configuration: | ||
- The vSphere CSI Operator monitors the ClusterCSIDriver object for changes. | ||
- Upon detecting a change, the operator reads the configured limit value. | ||
3. Operator updates the new limit for DaemonSet and Deployment: | ||
- The operator updates the pods of vSphere CSI driver, injecting the `MAX_VOLUMES_PER_NODE` environment | ||
variable with the configured limit value into the driver node pods on worker nodes. | ||
|
||
### API Extensions | ||
|
||
- New field in ClusterCSIDriver CRD: | ||
- A new CRD field will be introduced to represent the maximum volume limit configuration. | ||
- This CRD will contain a single new field (e.g., `spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode`) to define | ||
the desired limit. | ||
- The API will validate the value fits within the defined range (1-255). | ||
|
||
### Topology Considerations | ||
|
||
#### Hypershift / Hosted Control Planes | ||
|
||
No unique considerations for Hypershift. The configuration and behavior of the vSphere CSI driver with respect to the | ||
maximum volume limit will remain consistent across standalone and managed clusters. | ||
|
||
#### Standalone Clusters | ||
|
||
This enhancement is fully applicable to standalone OpenShift clusters. | ||
|
||
#### Single-node Deployments or MicroShift | ||
|
||
No unique considerations for MicroShift. The configuration and behavior of the vSphere CSI driver with respect to the | ||
maximum volume limit will remain consistent across standalone and SNO/MicroShift clusters. | ||
|
||
### Implementation Details/Notes/Constraints | ||
|
||
One of the possible future constraints might be increasing the limit with newer vSphere versions. However, we expect the | ||
limit to be increasing rather than decreasing and making the API validation more relaxed is possible. | ||
|
||
### Risks and Mitigations | ||
|
||
- None. | ||
|
||
### Drawbacks | ||
|
||
- Increased Complexity: Introducing a new CRD field and operator logic adds complexity to the vSphere CSI driver ecosystem. | ||
- Missing vSphere documentation: At the time of writing we don't have a clear statement or documentation to refer to | ||
that would well describe all the necessary details and limitations of this feature. See Documentation in | ||
the Proposal section for details. | ||
- Limited Granularity: The current proposal provides a global node-level limit. More fine-grained control | ||
(e.g., per-namespace or per-workload limits) would require further investigation and development. | ||
|
||
## Open Questions [optional] | ||
|
||
None. | ||
|
||
## Test Plan | ||
|
||
- E2E tests will be implemented to verify the correct propagation of the configured limit to the driver pods. | ||
These tests will be executed only on vSphere 8. | ||
|
||
## Graduation Criteria | ||
|
||
- TechPreview in 4.19. | ||
|
||
### Dev Preview -> Tech Preview | ||
|
||
- No Dev Preview phase. | ||
|
||
### Tech Preview -> GA | ||
|
||
- E2E test coverage demonstrating stability. | ||
- Available by default. | ||
- User facing documentation created in [openshift-docs](https://github.com/openshift/openshift-docs/). | ||
RomanBednar marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- We have to wait for VMware to GA this feature and document the configuration on vCenter side. | ||
|
||
### Removing a deprecated feature | ||
|
||
- No. | ||
|
||
## Upgrade / Downgrade Strategy | ||
|
||
- **Upgrades:** During an upgrade, the operator will apply the new API field value and update the driver pods with | ||
the new `MAX_VOLUMES_PER_NODE` value. If the field is not set, default value (59) is used to match the current limit | ||
for vSphere 7. So the limit will not change for existing deployments unless the user explicitly sets it. | ||
- **Downgrades:** Downgrading to a version without this feature will result in the API field being ignored and the | ||
operator will revert to its previous hardcoded value configured in DaemonSet (59). If there is a higher count of | ||
attached volumes than the limit after downgrade, the vSphere CSI driver will not be able to attach new volumes to | ||
nodes and users will need to manually detach the extra volumes. | ||
|
||
## Version Skew Strategy | ||
|
||
There are no version skew concerns for this enhancement. | ||
|
||
## Operational Aspects of API Extensions | ||
|
||
- API extension does not pose any operational challenges. | ||
|
||
## Support Procedures | ||
|
||
* To check the status of the vSphere CSI operator, use the following command: | ||
`oc get deployments -n openshift-cluster-csi-drivers`. Ensure that the operator is running and healthy, inspect logs. | ||
* To inspect the `ClusterCSIDriver` CRs, use the following command: `oc get clustercsidriver/csi.vsphere.vmware.com -o yaml`. | ||
Examine the `spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode` field. | ||
|
||
## Alternatives (Not Implemented) | ||
|
||
We considered several approaches to handle environments with mixed ESXi versions: | ||
|
||
1. **Cluster Degradation (Selected Approach)**: | ||
- We will degrade the cluster if the user-specified limit exceeds what's supported by the underlying infrastructure. | ||
- This requires checking the ClusterCSIDriver configuration against actual node capabilities in the `check_nodes.go` implementation. | ||
- The error messages will be specific about the incompatibility. | ||
- Documentation will clearly state that increased limits are not supported on environments containing ESXi 7.x hosts. | ||
|
||
2. **Warning-Only Approach**: | ||
- Allow any user-specified limit (up to 255) regardless of ESXi versions in the cluster. | ||
- Emit metrics and alerts when incompatible configurations are detected. | ||
- This approach would result in application pods getting stuck in ContainerCreating state when scheduled to ESXi 7.0 nodes that exceed the 59 attachment limit. | ||
- This option was rejected as it would lead to poor user experience with difficult-to-diagnose failures. | ||
|
||
3. **Dynamic Limit Adjustment**: | ||
- Have the DaemonSet controller ignore user-specified limits that exceed cluster capabilities and automatically switch to a supportable limit. | ||
- This option is technically complex as it would require: | ||
- Delaying CSI driver startup until all version checks complete | ||
- Implementing a DaemonSet hook to perform full cluster scans for ESXi versions (expensive operation) | ||
- Duplicating node checks already being performed elsewhere | ||
- This approach was rejected due to implementation complexity. | ||
|
||
4. **Driver-Level Detection**: | ||
- Add code to the DaemonSet pod that would detect limits from BIOS or OS and consider that when reporting attachment capabilities. | ||
- This would require modifications to the driver code itself, which would be better implemented by VMware. | ||
- This approach was rejected as it would depend on upstream changes that historically have been slow to implement. | ||
|
||
## Infrastructure Needed [optional] | ||
RomanBednar marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
- Current infrastructure needed to support the enhancement is available for testing vSphere version 8. | ||
- To test the feature we need to a nested vSphere environment and set `pvscsiCtrlr256DiskSupportEnabled` in | ||
vCenter config to allow the higher volume attachment limit. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.