Skip to content

Conversation

@RomanBednar
Copy link
Contributor

@RomanBednar RomanBednar commented Feb 4, 2025

Depends on

Manual verification

Test value limits for maxAllowedBlockVolumesPerNode field:

oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":-1}}}}'
The ClusterCSIDriver "csi.vsphere.vmware.com" is invalid: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode: Invalid value: -1: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode in body should be greater than or equal to 1

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":0}}}}'
The ClusterCSIDriver "csi.vsphere.vmware.com" is invalid: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode: Invalid value: 0: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode in body should be greater than or equal to 1

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":256}}}}'
The ClusterCSIDriver "csi.vsphere.vmware.com" is invalid: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode: Invalid value: 256: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode in body should be less than or equal to 255

Validate maxAllowedBlockVolumesPerNode value propagation to driver deployment as MAX_VOLUMES_PER_NODE:

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":60}}}}'
clustercsidriver.operator.openshift.io/csi.vsphere.vmware.com patched

$ oc -n openshift-cluster-csi-drivers get deployment.apps/vmware-vsphere-csi-driver-controller -o jsonpath='{.spec.template.spec.containers[0].env}'
[{"name":"CSI_ENDPOINT","value":"unix:///var/lib/csi/sockets/pluginproxy/csi.sock"},{"name":"X_CSI_MODE","value":"controller"},{"name":"VSPHERE_CSI_CONFIG","value":"/etc/kubernetes/vsphere-csi-config/cloud.conf"},{"name":"INCLUSTER_CLIENT_QPS","value":"100"},{"name":"INCLUSTER_CLIENT_BURST","value":"100"},{"name":"CSI_NAMESPACE","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}},{"name":"X_CSI_SERIAL_VOL_ACCESS_TIMEOUT","value":"3m"},{"name":"X_CSI_SPEC_DISABLE_LEN_CHECK","value":"true"},{"name":"MAX_VOLUMES_PER_NODE","value":"60"}]

$ oc -n openshift-cluster-csi-drivers get daemonset.apps/vmware-vsphere-csi-driver-node -o jsonpath='{.spec.template.spec.containers[0].env}'
[{"name":"NODE_NAME","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"spec.nodeName"}}},{"name":"CSI_ENDPOINT","value":"unix:///csi/csi.sock"},{"name":"X_CSI_MODE","value":"node"},{"name":"X_CSI_SPEC_DISABLE_LEN_CHECK","value":"true"},{"name":"CSI_NAMESPACE","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}},{"name":"MAX_VOLUMES_PER_NODE","value":"60"}]

Validate propagation to CSINode as allocatable count:

$ oc get csinode/ci-ln-k30mn5t-c1627-2tk2k-worker-0-72mfn -o jsonpath='{.spec.drivers[0].allocatable.count}'
60

If maxAllowedBlockVolumesPerNode is unset (for example after cluster upgrade) we must use default value (never zero):

$ oc get clustercsidriver/csi.vsphere.vmware.com -o jsonpath='{.spec.driverConfig}'
{"driverType":""}
$ oc -n openshift-cluster-csi-drivers get deployment.apps/vmware-vsphere-csi-driver-controller -o jsonpath='{.spec.template.spec.containers[0].env[?(@.name=="MAX_VOLUMES_PER_NODE")]}'
{"name":"MAX_VOLUMES_PER_NODE","value":"59"}

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 4, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Feb 4, 2025

@RomanBednar: This pull request references STOR-2141 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.19.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 4, 2025
@openshift-ci openshift-ci bot requested review from dobsonj and gnufied February 4, 2025 11:34
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 4, 2025
@RomanBednar RomanBednar changed the title WIP: STOR-2141: add daemonset hook to allow setting custom volume limit WIP: STOR-2141: add support for maxAllowedBlockVolumesPerNode Mar 17, 2025
@RomanBednar RomanBednar force-pushed the STOR-2141 branch 3 times, most recently from 623109c to ea8a387 Compare March 18, 2025 13:48
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 18, 2025

@RomanBednar: This pull request references STOR-2141 which is a valid jira issue.

Details

In response to this:

Manual verification

Test value limits for maxAllowedBlockVolumesPerNode field:

oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":-1}}}}'
The ClusterCSIDriver "csi.vsphere.vmware.com" is invalid: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode: Invalid value: -1: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode in body should be greater than or equal to 1

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":0}}}}'
The ClusterCSIDriver "csi.vsphere.vmware.com" is invalid: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode: Invalid value: 0: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode in body should be greater than or equal to 1

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":256}}}}'
The ClusterCSIDriver "csi.vsphere.vmware.com" is invalid: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode: Invalid value: 256: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode in body should be less than or equal to 255

Validate maxAllowedBlockVolumesPerNode value propagation to driver deployment as MAX_VOLUMES_PER_NODE:

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":60}}}}'
clustercsidriver.operator.openshift.io/csi.vsphere.vmware.com patched

$ oc -n openshift-cluster-csi-drivers get deployment.apps/vmware-vsphere-csi-driver-controller -o jsonpath='{.specontainers[0].env}'
[{"name":"CSI_ENDPOINT","value":"unix:///var/lib/csi/sockets/pluginproxy/csi.sock"},{"name":"X_CSI_MODE","value":"controller"},{"name":"VSPHERE_CSI_CONFIG","value":"/etc/kubernetes/vsphere-csi-config/cloud.conf"},{"name":"INCLUSTER_CLIENT_QPS","value":"100"},{"name":"INCLUSTER_CLIENT_BURST","value":"100"},{"name":"CSI_NAMESPACE","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}},{"name":"X_CSI_SERIAL_VOL_ACCESS_TIMEOUT","value":"3m"},{"name":"X_CSI_SPEC_DISABLE_LEN_CHECK","value":"true"},{"name":"MAX_VOLUMES_PER_NODE","value":"60"}]

oc -n openshift-cluster-csi-drivers get daemonset.apps/vmware-vsphere-csi-driver-node -o jsonpath='{.spec.template.spec.containers[0].env}'
[{"name":"NODE_NAME","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"spec.nodeName"}}},{"name":"CSI_ENDPOINT","value":"unix:///csi/csi.sock"},{"name":"X_CSI_MODE","value":"node"},{"name":"X_CSI_SPEC_DISABLE_LEN_CHECK","value":"true"},{"name":"CSI_NAMESPACE","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}},{"name":"MAX_VOLUMES_PER_NODE","value":"60"}]

Validate propagation to CSINode as allocatable count:

$ oc get csinode/ci-ln-k30mn5t-c1627-2tk2k-worker-0-72mfn -o jsonpath='{.spec.drivers[0].allocatable.count}'
60

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 20, 2025

@RomanBednar: This pull request references STOR-2141 which is a valid jira issue.

Details

In response to this:

Depends on

Manual verification

Test value limits for maxAllowedBlockVolumesPerNode field:

oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":-1}}}}'
The ClusterCSIDriver "csi.vsphere.vmware.com" is invalid: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode: Invalid value: -1: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode in body should be greater than or equal to 1

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":0}}}}'
The ClusterCSIDriver "csi.vsphere.vmware.com" is invalid: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode: Invalid value: 0: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode in body should be greater than or equal to 1

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":256}}}}'
The ClusterCSIDriver "csi.vsphere.vmware.com" is invalid: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode: Invalid value: 256: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode in body should be less than or equal to 255

Validate maxAllowedBlockVolumesPerNode value propagation to driver deployment as MAX_VOLUMES_PER_NODE:

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":60}}}}'
clustercsidriver.operator.openshift.io/csi.vsphere.vmware.com patched

$ oc -n openshift-cluster-csi-drivers get deployment.apps/vmware-vsphere-csi-driver-controller -o jsonpath='{.specontainers[0].env}'
[{"name":"CSI_ENDPOINT","value":"unix:///var/lib/csi/sockets/pluginproxy/csi.sock"},{"name":"X_CSI_MODE","value":"controller"},{"name":"VSPHERE_CSI_CONFIG","value":"/etc/kubernetes/vsphere-csi-config/cloud.conf"},{"name":"INCLUSTER_CLIENT_QPS","value":"100"},{"name":"INCLUSTER_CLIENT_BURST","value":"100"},{"name":"CSI_NAMESPACE","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}},{"name":"X_CSI_SERIAL_VOL_ACCESS_TIMEOUT","value":"3m"},{"name":"X_CSI_SPEC_DISABLE_LEN_CHECK","value":"true"},{"name":"MAX_VOLUMES_PER_NODE","value":"60"}]

oc -n openshift-cluster-csi-drivers get daemonset.apps/vmware-vsphere-csi-driver-node -o jsonpath='{.spec.template.spec.containers[0].env}'
[{"name":"NODE_NAME","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"spec.nodeName"}}},{"name":"CSI_ENDPOINT","value":"unix:///csi/csi.sock"},{"name":"X_CSI_MODE","value":"node"},{"name":"X_CSI_SPEC_DISABLE_LEN_CHECK","value":"true"},{"name":"CSI_NAMESPACE","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}},{"name":"MAX_VOLUMES_PER_NODE","value":"60"}]

Validate propagation to CSINode as allocatable count:

$ oc get csinode/ci-ln-k30mn5t-c1627-2tk2k-worker-0-72mfn -o jsonpath='{.spec.drivers[0].allocatable.count}'
60

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@RomanBednar RomanBednar changed the title WIP: STOR-2141: add support for maxAllowedBlockVolumesPerNode STOR-2141: add support for maxAllowedBlockVolumesPerNode Mar 21, 2025
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 21, 2025
@RomanBednar
Copy link
Contributor Author

/retest-required

@RomanBednar
Copy link
Contributor Author

/assign @gnufied

For review - feel free to reassign to other candidate.

@gnufied
Copy link
Member

gnufied commented Mar 25, 2025

Shouldn't all this code be behind a feature gate?

@RomanBednar
Copy link
Contributor Author

Shouldn't all this code be behind a feature gate?

It could, but do we want it? I thought we don't because:

  • we did not featuregate the code for snapshot options a while ago, and this is very similar
  • we have the API behind featuregate and we decided to let operator set default value, that means if the new field is unset we apply the default (59)

So I'm not sure what exactly we would gain from featuregating this code, but might have missed something so please share your thoughts.

@gnufied
Copy link
Member

gnufied commented Mar 27, 2025

So I'm not sure what exactly we would gain from featuregating this code, but might have missed something so please share your thoughts.

But then, why did we introduce https://github.com/openshift/api/blob/master/features/features.go#L821 featuregate in first place? Isn't the idea that, user must first enable that featuregate before this can be used?

If snapshot-options feature went TP without a featuregate check in the code, that was a mistake. cc @jsafrane

There are other reasons why this should be featuregated, say I am a user who discovers this feature, which is available by default (and I didn't read docs), then if this feature is removed in next release, my cluster will be broken. There is also risk of code change breaking stable features (no matter how small that chance is).

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 27, 2025

@RomanBednar: This pull request references STOR-2141 which is a valid jira issue.

Details

In response to this:

Depends on

Manual verification

Test value limits for maxAllowedBlockVolumesPerNode field:

oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":-1}}}}'
The ClusterCSIDriver "csi.vsphere.vmware.com" is invalid: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode: Invalid value: -1: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode in body should be greater than or equal to 1

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":0}}}}'
The ClusterCSIDriver "csi.vsphere.vmware.com" is invalid: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode: Invalid value: 0: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode in body should be greater than or equal to 1

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":256}}}}'
The ClusterCSIDriver "csi.vsphere.vmware.com" is invalid: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode: Invalid value: 256: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode in body should be less than or equal to 255

Validate maxAllowedBlockVolumesPerNode value propagation to driver deployment as MAX_VOLUMES_PER_NODE:

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":60}}}}'
clustercsidriver.operator.openshift.io/csi.vsphere.vmware.com patched

$ oc -n openshift-cluster-csi-drivers get deployment.apps/vmware-vsphere-csi-driver-controller -o jsonpath='{.spec.template.spec.containers[0].env}'
[{"name":"CSI_ENDPOINT","value":"unix:///var/lib/csi/sockets/pluginproxy/csi.sock"},{"name":"X_CSI_MODE","value":"controller"},{"name":"VSPHERE_CSI_CONFIG","value":"/etc/kubernetes/vsphere-csi-config/cloud.conf"},{"name":"INCLUSTER_CLIENT_QPS","value":"100"},{"name":"INCLUSTER_CLIENT_BURST","value":"100"},{"name":"CSI_NAMESPACE","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}},{"name":"X_CSI_SERIAL_VOL_ACCESS_TIMEOUT","value":"3m"},{"name":"X_CSI_SPEC_DISABLE_LEN_CHECK","value":"true"},{"name":"MAX_VOLUMES_PER_NODE","value":"60"}]

oc -n openshift-cluster-csi-drivers get daemonset.apps/vmware-vsphere-csi-driver-node -o jsonpath='{.spec.template.spec.containers[0].env}'
[{"name":"NODE_NAME","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"spec.nodeName"}}},{"name":"CSI_ENDPOINT","value":"unix:///csi/csi.sock"},{"name":"X_CSI_MODE","value":"node"},{"name":"X_CSI_SPEC_DISABLE_LEN_CHECK","value":"true"},{"name":"CSI_NAMESPACE","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}},{"name":"MAX_VOLUMES_PER_NODE","value":"60"}]

Validate propagation to CSINode as allocatable count:

$ oc get csinode/ci-ln-k30mn5t-c1627-2tk2k-worker-0-72mfn -o jsonpath='{.spec.drivers[0].allocatable.count}'
60

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 27, 2025

@RomanBednar: This pull request references STOR-2141 which is a valid jira issue.

Details

In response to this:

Depends on

Manual verification

Test value limits for maxAllowedBlockVolumesPerNode field:

oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":-1}}}}'
The ClusterCSIDriver "csi.vsphere.vmware.com" is invalid: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode: Invalid value: -1: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode in body should be greater than or equal to 1

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":0}}}}'
The ClusterCSIDriver "csi.vsphere.vmware.com" is invalid: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode: Invalid value: 0: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode in body should be greater than or equal to 1

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":256}}}}'
The ClusterCSIDriver "csi.vsphere.vmware.com" is invalid: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode: Invalid value: 256: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode in body should be less than or equal to 255

Validate maxAllowedBlockVolumesPerNode value propagation to driver deployment as MAX_VOLUMES_PER_NODE:

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":60}}}}'
clustercsidriver.operator.openshift.io/csi.vsphere.vmware.com patched

$ oc -n openshift-cluster-csi-drivers get deployment.apps/vmware-vsphere-csi-driver-controller -o jsonpath='{.spec.template.spec.containers[0].env}'
[{"name":"CSI_ENDPOINT","value":"unix:///var/lib/csi/sockets/pluginproxy/csi.sock"},{"name":"X_CSI_MODE","value":"controller"},{"name":"VSPHERE_CSI_CONFIG","value":"/etc/kubernetes/vsphere-csi-config/cloud.conf"},{"name":"INCLUSTER_CLIENT_QPS","value":"100"},{"name":"INCLUSTER_CLIENT_BURST","value":"100"},{"name":"CSI_NAMESPACE","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}},{"name":"X_CSI_SERIAL_VOL_ACCESS_TIMEOUT","value":"3m"},{"name":"X_CSI_SPEC_DISABLE_LEN_CHECK","value":"true"},{"name":"MAX_VOLUMES_PER_NODE","value":"60"}]

$ oc -n openshift-cluster-csi-drivers get daemonset.apps/vmware-vsphere-csi-driver-node -o jsonpath='{.spec.template.spec.containers[0].env}'
[{"name":"NODE_NAME","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"spec.nodeName"}}},{"name":"CSI_ENDPOINT","value":"unix:///csi/csi.sock"},{"name":"X_CSI_MODE","value":"node"},{"name":"X_CSI_SPEC_DISABLE_LEN_CHECK","value":"true"},{"name":"CSI_NAMESPACE","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}},{"name":"MAX_VOLUMES_PER_NODE","value":"60"}]

Validate propagation to CSINode as allocatable count:

$ oc get csinode/ci-ln-k30mn5t-c1627-2tk2k-worker-0-72mfn -o jsonpath='{.spec.drivers[0].allocatable.count}'
60

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 27, 2025

@RomanBednar: This pull request references STOR-2141 which is a valid jira issue.

Details

In response to this:

Depends on

Manual verification

Test value limits for maxAllowedBlockVolumesPerNode field:

oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":-1}}}}'
The ClusterCSIDriver "csi.vsphere.vmware.com" is invalid: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode: Invalid value: -1: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode in body should be greater than or equal to 1

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":0}}}}'
The ClusterCSIDriver "csi.vsphere.vmware.com" is invalid: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode: Invalid value: 0: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode in body should be greater than or equal to 1

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":256}}}}'
The ClusterCSIDriver "csi.vsphere.vmware.com" is invalid: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode: Invalid value: 256: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode in body should be less than or equal to 255

Validate maxAllowedBlockVolumesPerNode value propagation to driver deployment as MAX_VOLUMES_PER_NODE:

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":60}}}}'
clustercsidriver.operator.openshift.io/csi.vsphere.vmware.com patched

$ oc -n openshift-cluster-csi-drivers get deployment.apps/vmware-vsphere-csi-driver-controller -o jsonpath='{.spec.template.spec.containers[0].env}'
[{"name":"CSI_ENDPOINT","value":"unix:///var/lib/csi/sockets/pluginproxy/csi.sock"},{"name":"X_CSI_MODE","value":"controller"},{"name":"VSPHERE_CSI_CONFIG","value":"/etc/kubernetes/vsphere-csi-config/cloud.conf"},{"name":"INCLUSTER_CLIENT_QPS","value":"100"},{"name":"INCLUSTER_CLIENT_BURST","value":"100"},{"name":"CSI_NAMESPACE","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}},{"name":"X_CSI_SERIAL_VOL_ACCESS_TIMEOUT","value":"3m"},{"name":"X_CSI_SPEC_DISABLE_LEN_CHECK","value":"true"},{"name":"MAX_VOLUMES_PER_NODE","value":"60"}]

$ oc -n openshift-cluster-csi-drivers get daemonset.apps/vmware-vsphere-csi-driver-node -o jsonpath='{.spec.template.spec.containers[0].env}'
[{"name":"NODE_NAME","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"spec.nodeName"}}},{"name":"CSI_ENDPOINT","value":"unix:///csi/csi.sock"},{"name":"X_CSI_MODE","value":"node"},{"name":"X_CSI_SPEC_DISABLE_LEN_CHECK","value":"true"},{"name":"CSI_NAMESPACE","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}},{"name":"MAX_VOLUMES_PER_NODE","value":"60"}]

Validate propagation to CSINode as allocatable count:

$ oc get csinode/ci-ln-k30mn5t-c1627-2tk2k-worker-0-72mfn -o jsonpath='{.spec.drivers[0].allocatable.count}'
60

If maxAllowedBlockVolumesPerNode is unset (for example after cluster upgrade) we must use default value (never zero):

$ oc get clustercsidriver/csi.vsphere.vmware.com -o jsonpath='{.spec.driverConfig}'
{"driverType":""}
$ oc -n openshift-cluster-csi-drivers get deployment.apps/vmware-vsphere-csi-driver-controller -o jsonpath='{.spec.template.spec.containers[0].env[?(@.name=="MAX_VOLUMES_PER_NODE")]}'
{"name":"MAX_VOLUMES_PER_NODE","value":"59"}

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@RomanBednar
Copy link
Contributor Author

But then, why did we introduce https://github.com/openshift/api/blob/master/features/features.go#L821 featuregate in first place?

That feature gate now affects which CRDs are applied, so unless users enable that feature gate they will not be able to set the field (it will appear as invalid for clustercsidriver).

There is also risk of code change breaking stable features (no matter how small that chance is).

This is a valid point, but if there's a GA'd feature, it must have e2e tests, so in theory we should see any breakage well. But this was historically not always the case I think.

if this feature is removed in next release, my cluster will be broken

What scenario are we talking about exactly? I can see removal as problematic, either because of the field presence (we can remove it by operator) or having more volumes than 59 attached already (we won't be able to deal with this anyway). But feature removal should be only for cases like vmware removing it first right? And we don't know when/if that happens and we can't keep featuregates forever I believe.

So what's the suggestion here exactly?

In order to validate maximum volume attachment limit set by user
NodeChecker needs access to ClusterCSIDriver which is where users
set the value.
We need to check versions of all ESXI hosts in the cluster and if we
detect that users set a custom volume attachment limit that is
incorrect we degrade the cluster.

Incorrect value is any value above 59 if any of the vSphere hosts in a
cluster is not on ESXI version 8 or higher

In case the maxAllowedBlockVolumesPerNode field is not set it will
return 0, which is not valid and we need to use default.
@RomanBednar
Copy link
Contributor Author

/retest-required

@radeore
Copy link

radeore commented Apr 10, 2025

Performed pre-merge test with cluster-bot build image 4.19.0-0.test-2025-04-09-032635-ci-ln-cdqc3dk-latest
Pre-merge testing results looks good, test results are commented in JIRA

Since NodeChecker now checks max attachment limit value it is now safe
to add a hook for reflecting maxAllowedBlockVolumesPerNode field of
clusterCSIDriver into daemonset as env variable.
@gnufied
Copy link
Member

gnufied commented Apr 11, 2025

hmm, looks like vsphere operator tests that check storage removal are failing. Is that a real failure? @RomanBednar can you check?

@RomanBednar
Copy link
Contributor Author

/retest-required

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 11, 2025

@RomanBednar: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-vsphere-zones 5c04e92 link false /test e2e-vsphere-zones
ci/prow/okd-scos-e2e-aws-ovn 5c04e92 link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@RomanBednar
Copy link
Contributor Author

@gnufied Looks like a flake, green on next try.

@gnufied
Copy link
Member

gnufied commented Apr 14, 2025

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 14, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 14, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gnufied, RomanBednar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [RomanBednar,gnufied]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@radeore
Copy link

radeore commented Apr 14, 2025

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Apr 14, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Apr 14, 2025

@RomanBednar: This pull request references STOR-2141 which is a valid jira issue.

Details

In response to this:

Depends on

Manual verification

Test value limits for maxAllowedBlockVolumesPerNode field:

oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":-1}}}}'
The ClusterCSIDriver "csi.vsphere.vmware.com" is invalid: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode: Invalid value: -1: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode in body should be greater than or equal to 1

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":0}}}}'
The ClusterCSIDriver "csi.vsphere.vmware.com" is invalid: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode: Invalid value: 0: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode in body should be greater than or equal to 1

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":256}}}}'
The ClusterCSIDriver "csi.vsphere.vmware.com" is invalid: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode: Invalid value: 256: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode in body should be less than or equal to 255

Validate maxAllowedBlockVolumesPerNode value propagation to driver deployment as MAX_VOLUMES_PER_NODE:

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":60}}}}'
clustercsidriver.operator.openshift.io/csi.vsphere.vmware.com patched

$ oc -n openshift-cluster-csi-drivers get deployment.apps/vmware-vsphere-csi-driver-controller -o jsonpath='{.spec.template.spec.containers[0].env}'
[{"name":"CSI_ENDPOINT","value":"unix:///var/lib/csi/sockets/pluginproxy/csi.sock"},{"name":"X_CSI_MODE","value":"controller"},{"name":"VSPHERE_CSI_CONFIG","value":"/etc/kubernetes/vsphere-csi-config/cloud.conf"},{"name":"INCLUSTER_CLIENT_QPS","value":"100"},{"name":"INCLUSTER_CLIENT_BURST","value":"100"},{"name":"CSI_NAMESPACE","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}},{"name":"X_CSI_SERIAL_VOL_ACCESS_TIMEOUT","value":"3m"},{"name":"X_CSI_SPEC_DISABLE_LEN_CHECK","value":"true"},{"name":"MAX_VOLUMES_PER_NODE","value":"60"}]

$ oc -n openshift-cluster-csi-drivers get daemonset.apps/vmware-vsphere-csi-driver-node -o jsonpath='{.spec.template.spec.containers[0].env}'
[{"name":"NODE_NAME","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"spec.nodeName"}}},{"name":"CSI_ENDPOINT","value":"unix:///csi/csi.sock"},{"name":"X_CSI_MODE","value":"node"},{"name":"X_CSI_SPEC_DISABLE_LEN_CHECK","value":"true"},{"name":"CSI_NAMESPACE","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}},{"name":"MAX_VOLUMES_PER_NODE","value":"60"}]

Validate propagation to CSINode as allocatable count:

$ oc get csinode/ci-ln-k30mn5t-c1627-2tk2k-worker-0-72mfn -o jsonpath='{.spec.drivers[0].allocatable.count}'
60

If maxAllowedBlockVolumesPerNode is unset (for example after cluster upgrade) we must use default value (never zero):

$ oc get clustercsidriver/csi.vsphere.vmware.com -o jsonpath='{.spec.driverConfig}'
{"driverType":""}
$ oc -n openshift-cluster-csi-drivers get deployment.apps/vmware-vsphere-csi-driver-controller -o jsonpath='{.spec.template.spec.containers[0].env[?(@.name=="MAX_VOLUMES_PER_NODE")]}'
{"name":"MAX_VOLUMES_PER_NODE","value":"59"}

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-bot openshift-merge-bot bot merged commit 5e1017e into openshift:main Apr 14, 2025
12 of 14 checks passed
@openshift-bot
Copy link

[ART PR BUILD NOTIFIER]

Distgit: ose-vmware-vsphere-csi-driver-operator
This PR has been included in build ose-vmware-vsphere-csi-driver-operator-container-v4.20.0-202504141612.p0.g5e1017e.assembly.stream.el9.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants