Skip to content

[TiDB] podManagementPolicy silently changes to Parallel if the value is invalid #150

@unw9527

Description

@unw9527

Bug Report

What version of Kubernetes are you using?

Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.1", GitCommit:"d224476cd0730baca2b6e357d144171ed74192d6", GitTreeState:"clean", BuildDate:"2020-01-14T21:04:32Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.9", GitCommit:"6df4433e288edc9c40c2e344eb336f63fad45cd2", GitTreeState:"clean", BuildDate:"2022-05-19T19:53:08Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}

What version of TiDB Operator are you using?

TiDB Operator Version: version.Info{GitVersion:"v1.3.0-45+1470cfb46e1ffb-dirty", GitCommit:"1470cfb46e1ffb8bb86f74ba455865a95b825413", GitTreeState:"dirty", BuildDate:"2022-07-07T21:33:51Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"linux/amd64"}

What storage classes exist in the Kubernetes cluster and what are used for PD/TiKV pods?

$ kubectl get sc
NAME                 PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
standard (default)   rancher.io/local-path   Delete          WaitForFirstConsumer   false                  9m4s

$ kubectl get pvc -n {tidb-cluster-namespace}
NAME                        STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pd-advanced-tidb-pd-0       Bound    pvc-5d85aab1-3a5b-4bd2-a096-83be8a6e4c63   10Gi       RWO            standard       8m37s
pd-advanced-tidb-pd-1       Bound    pvc-22015e6c-601d-409e-b480-07458ce53711   10Gi       RWO            standard       8m37s
pd-advanced-tidb-pd-2       Bound    pvc-aa1bfdda-b2be-44ed-a721-a17ef5d6140c   10Gi       RWO            standard       8m37s
tikv-advanced-tidb-tikv-0   Bound    pvc-a702225b-1b7f-4363-8975-04f9170f5853   100Gi      RWO            standard       7m11s
tikv-advanced-tidb-tikv-1   Bound    pvc-fc633404-97d4-4359-9921-61faaf26b4c8   100Gi      RWO            standard       7m11s
tikv-advanced-tidb-tikv-2   Bound    pvc-69b6c9eb-6dcf-4ae6-91b8-912551c3c1d4   100Gi      RWO            standard       7m11s

What's the status of the TiDB cluster pods?

NAME                                       READY   STATUS    RESTARTS   AGE     IP           NODE           NOMINATED NODE   READINESS GATES
advanced-tidb-discovery-6998694d4c-zmg5d   1/1     Running   0          9m21s   10.244.3.2   test-worker2   <none>           <none>
advanced-tidb-pd-0                         1/1     Running   0          9m20s   10.244.2.3   test-worker    <none>           <none>
advanced-tidb-pd-1                         1/1     Running   0          9m20s   10.244.3.4   test-worker2   <none>           <none>
advanced-tidb-pd-2                         1/1     Running   0          9m20s   10.244.1.5   test-worker3   <none>           <none>
advanced-tidb-tidb-0                       2/2     Running   0          6m48s   10.244.2.6   test-worker    <none>           <none>
advanced-tidb-tidb-1                       2/2     Running   0          6m48s   10.244.3.7   test-worker2   <none>           <none>
advanced-tidb-tidb-2                       2/2     Running   0          6m48s   10.244.1.8   test-worker3   <none>           <none>
advanced-tidb-tikv-0                       1/1     Running   0          7m54s   10.244.2.5   test-worker    <none>           <none>
advanced-tidb-tikv-1                       1/1     Running   0          7m54s   10.244.3.6   test-worker2   <none>           <none>
advanced-tidb-tikv-2                       1/1     Running   0          7m54s   10.244.1.7   test-worker3   <none>           <none>
tidb-controller-manager-5c775c65f5-24sw4   1/1     Running   0          9m36s   10.244.1.3   test-worker3   <none>           <none>
tidb-scheduler-5665b7f8fb-j4864            2/2     Running   0          9m36s   10.244.1.2   test-worker3   <none>           <none>

What did you do?

The field spec.pd.podManagementPolicy silently falls back to parallel if the user supplies a value other than OrderedReady.
The CR file we used is shown below, note that user has a typo for the spec.pd.podManagementPolicy as OrderredReady other than OrderedReady

CR file
apiVersion: pingcap.com/v1alpha1
kind: TidbCluster
metadata:
  name: advanced-tidb
spec:
  version: "v5.4.0"
  timezone: UTC
  helper:
    image: busybox:1.34.1
  pvReclaimPolicy: Retain
  enableDynamicConfiguration: true
  pd:
    baseImage: pingcap/pd
    config: |
      [dashboard]
        internal-proxy = true
    replicas: 3
    maxFailoverCount: 0
    requests:
      storage: 10Gi
    mountClusterClientSecret: true
    podManagementPolicy: OrderredReady
  tidb:
    baseImage: pingcap/tidb
    config: |
      [performance]
        tcp-keep-alive = true
    replicas: 3
    maxFailoverCount: 0
    service:
      type: NodePort
      externalTrafficPolicy: Local
  tikv:
    baseImage: pingcap/tikv
    config: |
      log-level = "info"
    replicas: 3
    maxFailoverCount: 0
    requests:
      storage: 100Gi
    mountClusterClientSecret: true

What did you expect to see?
We expected to see the input get rejected with a clear error message, or fall back to the safe value with some error messages.

What did you see instead?
We see that spec.pd.podManagementPolicy is changed to parallel without any error message.

Additional comment
We inspected the source code and found that podManagementPolicy will silently fall back to Parallel if podManagementPolicy is any string other than OrderedReady (see this function). We believe that if users provide an invalid podManagementPolicy, there should be a log indicating that podManagementPolicy is invalid before the operator adopts one of its default values.

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions