-
Notifications
You must be signed in to change notification settings - Fork 47
[TiDB] podManagementPolicy silently changes to Parallel if the value is invalid #150
Description
Bug Report
What version of Kubernetes are you using?
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.1", GitCommit:"d224476cd0730baca2b6e357d144171ed74192d6", GitTreeState:"clean", BuildDate:"2020-01-14T21:04:32Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.9", GitCommit:"6df4433e288edc9c40c2e344eb336f63fad45cd2", GitTreeState:"clean", BuildDate:"2022-05-19T19:53:08Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}What version of TiDB Operator are you using?
TiDB Operator Version: version.Info{GitVersion:"v1.3.0-45+1470cfb46e1ffb-dirty", GitCommit:"1470cfb46e1ffb8bb86f74ba455865a95b825413", GitTreeState:"dirty", BuildDate:"2022-07-07T21:33:51Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"linux/amd64"}What storage classes exist in the Kubernetes cluster and what are used for PD/TiKV pods?
$ kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
standard (default) rancher.io/local-path Delete WaitForFirstConsumer false 9m4s
$ kubectl get pvc -n {tidb-cluster-namespace}
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pd-advanced-tidb-pd-0 Bound pvc-5d85aab1-3a5b-4bd2-a096-83be8a6e4c63 10Gi RWO standard 8m37s
pd-advanced-tidb-pd-1 Bound pvc-22015e6c-601d-409e-b480-07458ce53711 10Gi RWO standard 8m37s
pd-advanced-tidb-pd-2 Bound pvc-aa1bfdda-b2be-44ed-a721-a17ef5d6140c 10Gi RWO standard 8m37s
tikv-advanced-tidb-tikv-0 Bound pvc-a702225b-1b7f-4363-8975-04f9170f5853 100Gi RWO standard 7m11s
tikv-advanced-tidb-tikv-1 Bound pvc-fc633404-97d4-4359-9921-61faaf26b4c8 100Gi RWO standard 7m11s
tikv-advanced-tidb-tikv-2 Bound pvc-69b6c9eb-6dcf-4ae6-91b8-912551c3c1d4 100Gi RWO standard 7m11sWhat's the status of the TiDB cluster pods?
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
advanced-tidb-discovery-6998694d4c-zmg5d 1/1 Running 0 9m21s 10.244.3.2 test-worker2 <none> <none>
advanced-tidb-pd-0 1/1 Running 0 9m20s 10.244.2.3 test-worker <none> <none>
advanced-tidb-pd-1 1/1 Running 0 9m20s 10.244.3.4 test-worker2 <none> <none>
advanced-tidb-pd-2 1/1 Running 0 9m20s 10.244.1.5 test-worker3 <none> <none>
advanced-tidb-tidb-0 2/2 Running 0 6m48s 10.244.2.6 test-worker <none> <none>
advanced-tidb-tidb-1 2/2 Running 0 6m48s 10.244.3.7 test-worker2 <none> <none>
advanced-tidb-tidb-2 2/2 Running 0 6m48s 10.244.1.8 test-worker3 <none> <none>
advanced-tidb-tikv-0 1/1 Running 0 7m54s 10.244.2.5 test-worker <none> <none>
advanced-tidb-tikv-1 1/1 Running 0 7m54s 10.244.3.6 test-worker2 <none> <none>
advanced-tidb-tikv-2 1/1 Running 0 7m54s 10.244.1.7 test-worker3 <none> <none>
tidb-controller-manager-5c775c65f5-24sw4 1/1 Running 0 9m36s 10.244.1.3 test-worker3 <none> <none>
tidb-scheduler-5665b7f8fb-j4864 2/2 Running 0 9m36s 10.244.1.2 test-worker3 <none> <none>What did you do?
The field spec.pd.podManagementPolicy silently falls back to parallel if the user supplies a value other than OrderedReady.
The CR file we used is shown below, note that user has a typo for the spec.pd.podManagementPolicy as OrderredReady other than OrderedReady
CR file
apiVersion: pingcap.com/v1alpha1
kind: TidbCluster
metadata:
name: advanced-tidb
spec:
version: "v5.4.0"
timezone: UTC
helper:
image: busybox:1.34.1
pvReclaimPolicy: Retain
enableDynamicConfiguration: true
pd:
baseImage: pingcap/pd
config: |
[dashboard]
internal-proxy = true
replicas: 3
maxFailoverCount: 0
requests:
storage: 10Gi
mountClusterClientSecret: true
podManagementPolicy: OrderredReady
tidb:
baseImage: pingcap/tidb
config: |
[performance]
tcp-keep-alive = true
replicas: 3
maxFailoverCount: 0
service:
type: NodePort
externalTrafficPolicy: Local
tikv:
baseImage: pingcap/tikv
config: |
log-level = "info"
replicas: 3
maxFailoverCount: 0
requests:
storage: 100Gi
mountClusterClientSecret: trueWhat did you expect to see?
We expected to see the input get rejected with a clear error message, or fall back to the safe value with some error messages.
What did you see instead?
We see that spec.pd.podManagementPolicy is changed to parallel without any error message.
Additional comment
We inspected the source code and found that podManagementPolicy will silently fall back to Parallel if podManagementPolicy is any string other than OrderedReady (see this function). We believe that if users provide an invalid podManagementPolicy, there should be a log indicating that podManagementPolicy is invalid before the operator adopts one of its default values.