-
Notifications
You must be signed in to change notification settings - Fork 47
[TiDB] Empty objects under storageClaims bypass TiDB's validation #136
Description
Bug Report
What version of Kubernetes are you using?
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.1", GitCommit:"d224476cd0730baca2b6e357d144171ed74192d6", GitTreeState:"clean", BuildDate:"2020-01-14T21:04:32Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.9", GitCommit:"6df4433e288edc9c40c2e344eb336f63fad45cd2", GitTreeState:"clean", BuildDate:"2022-05-19T19:53:08Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}What version of TiDB Operator are you using?
TiDB Operator Version: version.Info{GitVersion:"v1.3.0-45+1470cfb46e1ffb-dirty", GitCommit:"1470cfb46e1ffb8bb86f74ba455865a95b825413", GitTreeState:"dirty", BuildDate:"2022-07-06T18:55:00Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"linux/amd64"}What storage classes exist in the Kubernetes cluster and what are used for PD/TiKV pods?
$ kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
standard (default) rancher.io/local-path Delete WaitForFirstConsumer false 40m
$ kubectl get pvc -n {tidb-cluster-namespace}
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pd-advanced-tidb-pd-0 Bound pvc-b566858b-bf4e-4e33-b31e-5d7feb7397b1 10Gi RWO standard 40m
pd-advanced-tidb-pd-1 Bound pvc-df70980f-12cf-499f-8ad7-e41cac98c5d0 10Gi RWO standard 40m
pd-advanced-tidb-pd-2 Bound pvc-d41691d8-feb5-4e21-b282-1ece1851cffa 10Gi RWO standard 40m
tikv-advanced-tidb-tikv-0 Bound pvc-42e652d6-2400-4ae8-b790-cef8466e4566 100Gi RWO standard 39m
tikv-advanced-tidb-tikv-1 Bound pvc-5af08c43-e02d-433c-896a-b85ad568d1ca 100Gi RWO standard 39m
tikv-advanced-tidb-tikv-2 Bound pvc-652761b6-9fff-4080-b13d-9e364062cddc 100Gi RWO standard 39mWhat's the status of the TiDB cluster pods?
$ kubectl get po -n {tidb-cluster-namespace} -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
advanced-tidb-discovery-6998694d4c-fmpjl 1/1 Running 0 41m 10.244.2.2 test-worker3 <none> <none>
advanced-tidb-pd-0 1/1 Running 0 41m 10.244.3.4 test-worker <none> <none>
advanced-tidb-pd-1 1/1 Running 0 41m 10.244.2.4 test-worker3 <none> <none>
advanced-tidb-pd-2 1/1 Running 0 41m 10.244.1.4 test-worker2 <none> <none>
advanced-tidb-tidb-0 2/2 Running 0 39m 10.244.3.7 test-worker <none> <none>
advanced-tidb-tidb-1 2/2 Running 0 39m 10.244.2.7 test-worker3 <none> <none>
advanced-tidb-tidb-2 2/2 Running 0 39m 10.244.1.7 test-worker2 <none> <none>
advanced-tidb-tikv-0 1/1 Running 0 40m 10.244.2.6 test-worker3 <none> <none>
advanced-tidb-tikv-1 1/1 Running 0 40m 10.244.3.6 test-worker <none> <none>
advanced-tidb-tikv-2 1/1 Running 0 40m 10.244.1.6 test-worker2 <none> <none>
tidb-controller-manager-6cc68f8949-52vwt 1/1 Running 0 41m 10.244.3.2 test-worker <none> <none>
tidb-scheduler-dd569b6b4-hj294 2/2 Running 0 41m 10.244.1.2 test-worker2 <none> <none>What did you do?
We created a CR with TiFlash as follows and we deployed it using kubectl apply -f.
The CR file
apiVersion: pingcap.com/v1alpha1
kind: TidbCluster
metadata:
name: advanced-tidb
spec:
version: "v5.4.0"
timezone: UTC
configUpdateStrategy: RollingUpdate
helper:
image: busybox:1.34.1
pvReclaimPolicy: Retain
enableDynamicConfiguration: true
tiflash:
baseImage: pingcap/tiflash
replicas: 2
storageClaims:
- {}
- {}
- {}
pd:
baseImage: pingcap/pd
config: |
[dashboard]
internal-proxy = true
replicas: 3
maxFailoverCount: 0
requests:
storage: 10Gi
mountClusterClientSecret: true
tidb:
baseImage: pingcap/tidb
config: |
[performance]
tcp-keep-alive = true
replicas: 3
maxFailoverCount: 0
service:
type: NodePort
externalTrafficPolicy: Local
tikv:
baseImage: pingcap/tikv
config: |
log-level = "info"
replicas: 3
maxFailoverCount: 0
requests:
storage: 100Gi
mountClusterClientSecret: trueNote that although the field spec.tiflash.storageClaims has more than one items, the items are all empty.
What did you expect to see?
We expected to see that the input would be rejected as the items of storageClaims do not have the required value for request.storage. We expect to see an error message indicating the invalid input.
What did you see instead?
The pod related to TiFlash was never created because the PVC cannot be created without the storage requirement. The tidb-operator silently fails to create the TiFlash cluster without any error message to the kubectl nor the operator log.
Possible reason
We believed that the validation here is not comprehensive. If we fill storageClaims with values without specifying the storage requirement, it will bypass this if statement as the length of spec.StorageClaims is 3 in our case.
From the events we found out that spec.tiflash[INDEX].resources.requests.storage is a required field. But since the resources.requests is a map, so it is hard to do validation from the CRD side. We suggest to add extra validation in validation.go to check for the storage field. We are happy to provide the PR to fix this.