Skip to content

[TiDB] Empty objects under storageClaims bypass TiDB's validation #136

@unw9527

Description

@unw9527

Bug Report

What version of Kubernetes are you using?

Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.1", GitCommit:"d224476cd0730baca2b6e357d144171ed74192d6", GitTreeState:"clean", BuildDate:"2020-01-14T21:04:32Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.9", GitCommit:"6df4433e288edc9c40c2e344eb336f63fad45cd2", GitTreeState:"clean", BuildDate:"2022-05-19T19:53:08Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}

What version of TiDB Operator are you using?

TiDB Operator Version: version.Info{GitVersion:"v1.3.0-45+1470cfb46e1ffb-dirty", GitCommit:"1470cfb46e1ffb8bb86f74ba455865a95b825413", GitTreeState:"dirty", BuildDate:"2022-07-06T18:55:00Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"linux/amd64"}

What storage classes exist in the Kubernetes cluster and what are used for PD/TiKV pods?

$ kubectl get sc
NAME                 PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
standard (default)   rancher.io/local-path   Delete          WaitForFirstConsumer   false                  40m

$ kubectl get pvc -n {tidb-cluster-namespace}
NAME                        STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pd-advanced-tidb-pd-0       Bound    pvc-b566858b-bf4e-4e33-b31e-5d7feb7397b1   10Gi       RWO            standard       40m
pd-advanced-tidb-pd-1       Bound    pvc-df70980f-12cf-499f-8ad7-e41cac98c5d0   10Gi       RWO            standard       40m
pd-advanced-tidb-pd-2       Bound    pvc-d41691d8-feb5-4e21-b282-1ece1851cffa   10Gi       RWO            standard       40m
tikv-advanced-tidb-tikv-0   Bound    pvc-42e652d6-2400-4ae8-b790-cef8466e4566   100Gi      RWO            standard       39m
tikv-advanced-tidb-tikv-1   Bound    pvc-5af08c43-e02d-433c-896a-b85ad568d1ca   100Gi      RWO            standard       39m
tikv-advanced-tidb-tikv-2   Bound    pvc-652761b6-9fff-4080-b13d-9e364062cddc   100Gi      RWO            standard       39m

What's the status of the TiDB cluster pods?

$ kubectl get po -n {tidb-cluster-namespace} -o wide
NAME                                       READY   STATUS    RESTARTS   AGE   IP           NODE           NOMINATED NODE   READINESS GATES
advanced-tidb-discovery-6998694d4c-fmpjl   1/1     Running   0          41m   10.244.2.2   test-worker3   <none>           <none>
advanced-tidb-pd-0                         1/1     Running   0          41m   10.244.3.4   test-worker    <none>           <none>
advanced-tidb-pd-1                         1/1     Running   0          41m   10.244.2.4   test-worker3   <none>           <none>
advanced-tidb-pd-2                         1/1     Running   0          41m   10.244.1.4   test-worker2   <none>           <none>
advanced-tidb-tidb-0                       2/2     Running   0          39m   10.244.3.7   test-worker    <none>           <none>
advanced-tidb-tidb-1                       2/2     Running   0          39m   10.244.2.7   test-worker3   <none>           <none>
advanced-tidb-tidb-2                       2/2     Running   0          39m   10.244.1.7   test-worker2   <none>           <none>
advanced-tidb-tikv-0                       1/1     Running   0          40m   10.244.2.6   test-worker3   <none>           <none>
advanced-tidb-tikv-1                       1/1     Running   0          40m   10.244.3.6   test-worker    <none>           <none>
advanced-tidb-tikv-2                       1/1     Running   0          40m   10.244.1.6   test-worker2   <none>           <none>
tidb-controller-manager-6cc68f8949-52vwt   1/1     Running   0          41m   10.244.3.2   test-worker    <none>           <none>
tidb-scheduler-dd569b6b4-hj294             2/2     Running   0          41m   10.244.1.2   test-worker2   <none>           <none>

What did you do?

We created a CR with TiFlash as follows and we deployed it using kubectl apply -f.

The CR file
apiVersion: pingcap.com/v1alpha1
kind: TidbCluster
metadata:
  name: advanced-tidb
spec:
  version: "v5.4.0"
  timezone: UTC
  configUpdateStrategy: RollingUpdate
  helper:
    image: busybox:1.34.1
  pvReclaimPolicy: Retain
  enableDynamicConfiguration: true

  tiflash:
    baseImage: pingcap/tiflash
    replicas: 2
    storageClaims:
    - {}
    - {}
    - {}
  pd:
    baseImage: pingcap/pd
    config: |
      [dashboard]
        internal-proxy = true
    replicas: 3
    maxFailoverCount: 0
    requests:
      storage: 10Gi
    mountClusterClientSecret: true
  tidb:
    baseImage: pingcap/tidb
    config: |
      [performance]
        tcp-keep-alive = true
    replicas: 3
    maxFailoverCount: 0
    service:
      type: NodePort
      externalTrafficPolicy: Local
  tikv:
    baseImage: pingcap/tikv
    config: |
      log-level = "info"
    replicas: 3
    maxFailoverCount: 0
    requests:
      storage: 100Gi
    mountClusterClientSecret: true

Note that although the field spec.tiflash.storageClaims has more than one items, the items are all empty.

What did you expect to see?
We expected to see that the input would be rejected as the items of storageClaims do not have the required value for request.storage. We expect to see an error message indicating the invalid input.

What did you see instead?
The pod related to TiFlash was never created because the PVC cannot be created without the storage requirement. The tidb-operator silently fails to create the TiFlash cluster without any error message to the kubectl nor the operator log.

Possible reason
We believed that the validation here is not comprehensive. If we fill storageClaims with values without specifying the storage requirement, it will bypass this if statement as the length of spec.StorageClaims is 3 in our case.

From the events we found out that spec.tiflash[INDEX].resources.requests.storage is a required field. But since the resources.requests is a map, so it is hard to do validation from the CRD side. We suggest to add extra validation in validation.go to check for the storage field. We are happy to provide the PR to fix this.

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions