log-cache NODE_INDEX incorrect not reference to bpm.yml when multi_az enabled. #1106
Description
Describe the bug
When deploy kubecf with multi_az=true, all the log-cache jobs started with NODE_INDEX=0
and this cause the log-cache cluster not working properly.
cf logs
not work.
cf push xxx
and cf app xxx
not working, failed with client timeout.
To Reproduce
cf-operator: 5.2.0
kubecf: v2.2.3
deploy kubecf with multi_az=true
config log-cache with more than 1 instance.
for example:
$ k get pod -n kubecf |grep log-cache
log-cache-z0-0 10/10 Running 0 18m
log-cache-z0-1 10/10 Running 0 11m
log-cache-z1-0 10/10 Running 0 18m
log-cache-z1-1 10/10 Running 0 11m
login log-cache container (take log-cache-z1-0 for example) and check the environment NODE_INDEX
:
sh-4.4# printenv |grep NODE
NODE_INDEX=0
NODE_ADDRS=log-cache-z0-0:8080,log-cache-z0-1:8080,log-cache-z1-0:8080,log-cache-z1-1:8080
check file /var/vcap/jobs/log-cache/config/bpm.yml,
sh-4.4# cat bpm.yml |grep NODE_INDEX
NODE_INDEX: "2"
check the log-cache logs
$ k logs log-cache-z1-0 -c log-cache-log-cache -n kubecf
2020/08/31 02:16:50 WARNING: proto: file "egress.proto" is already registered
A future release will panic on registration conflicts. See:
https://developers.google.com/protocol-buffers/docs/reference/go/faq#namespace-conflict
2020/08/31 02:16:50 WARNING: proto: file "ingress.proto" is already registered
A future release will panic on registration conflicts. See:
https://developers.google.com/protocol-buffers/docs/reference/go/faq#namespace-conflict
2020/08/31 02:16:50.027381 Starting Log Cache...
FIELD NAME: TYPE: ENV: REQUIRED: VALUE:
Config.Addr string ADDR true :8080
Config.QueryTimeout time.Duration QUERY_TIMEOUT false 10s
Config.MemoryLimit uint MEMORY_LIMIT_PERCENT false 50
Config.MaxPerSource int MAX_PER_SOURCE false 100000
Config.NodeIndex int NODE_INDEX false 0
Config.NodeAddrs []string NODE_ADDRS false [log-cache-z0-0:8080 log-cache-z0-1:8080 log-cache-z1-0:8080 log-cache-z1-1:8080]
TLS.CAPath string CA_PATH true /var/vcap/jobs/log-cache/config/certs/ca.crt
TLS.CertPath string CERT_PATH true /var/vcap/jobs/log-cache/config/certs/log_cache.crt
TLS.KeyPath string KEY_PATH true /var/vcap/jobs/log-cache/config/certs/log_cache.key
MetricsServer.Port uint16 METRICS_PORT false 6060
MetricsServer.CAFile string METRICS_CA_FILE_PATH false /var/vcap/jobs/log-cache/config/certs/metrics_ca.crt
MetricsServer.CertFile string METRICS_CERT_FILE_PATH false /var/vcap/jobs/log-cache/config/certs/metrics.crt
MetricsServer.KeyFile string METRICS_KEY_FILE_PATH false /var/vcap/jobs/log-cache/config/certs/metrics.key
2020/08/31 02:16:50 Metrics endpoint is listening on [::]:6060
Check the statefulset log-cache in each zone:
$ k describe statefulset log-cache-z0 -n kubecf |grep NODE_INDEX
NODE_INDEX: 0
$ k describe statefulset log-cache-z1 -n kubecf |grep NODE_INDEX
NODE_INDEX: 0
NODE_INDEX is set as the Environment of container log-cache-log-cache
Environment:
ADDR: :8080
CA_PATH: /var/vcap/jobs/log-cache/config/certs/ca.crt
CERT_PATH: /var/vcap/jobs/log-cache/config/certs/log_cache.crt
KEY_PATH: /var/vcap/jobs/log-cache/config/certs/log_cache.key
MAX_PER_SOURCE: 100000
MEMORY_LIMIT_PERCENT: 50
METRICS_CA_FILE_PATH: /var/vcap/jobs/log-cache/config/certs/metrics_ca.crt
METRICS_CERT_FILE_PATH: /var/vcap/jobs/log-cache/config/certs/metrics.crt
METRICS_KEY_FILE_PATH: /var/vcap/jobs/log-cache/config/certs/metrics.key
METRICS_PORT: 6060
NODE_ADDRS: log-cache-z0-0:8080,log-cache-z0-1:8080,log-cache-z1-0:8080,log-cache-z1-1:8080
NODE_INDEX: 0
QUERY_TIMEOUT: 10s
KUBE_AZ: us-south-1
BOSH_AZ: us-south-1
CF_OPERATOR_AZ: us-south-1
AZ_INDEX: 1
REPLICAS: 2
issue:
The log-cache job start reference to the container environment value NODE_INDEX=0
instead of using the value in bpm.yml, and this cause all the log-cache jobs run with NODE_INDEX=0
and fail to work as a cluster.
Expected behavior
logcach job can join the cluster with correct NODE_INDEX, the value in bpm.yml
Environment
cf-operator: 5.2.0
kubecf: v2.2.3