Skip to content
This repository was archived by the owner on Oct 22, 2021. It is now read-only.
This repository was archived by the owner on Oct 22, 2021. It is now read-only.

log-cache NODE_INDEX incorrect not reference to bpm.yml when multi_az enabled. #1106

Open
@ShuangMen

Description

@ShuangMen

Describe the bug
When deploy kubecf with multi_az=true, all the log-cache jobs started with NODE_INDEX=0 and this cause the log-cache cluster not working properly.

cf logs not work.
cf push xxx and cf app xxx not working, failed with client timeout.

To Reproduce
cf-operator: 5.2.0
kubecf: v2.2.3
deploy kubecf with multi_az=true
config log-cache with more than 1 instance.

for example:

$ k get pod -n kubecf |grep log-cache
log-cache-z0-0                           10/10   Running     0          18m
log-cache-z0-1                           10/10   Running     0          11m
log-cache-z1-0                           10/10   Running     0          18m
log-cache-z1-1                           10/10   Running     0          11m

login log-cache container (take log-cache-z1-0 for example) and check the environment NODE_INDEX:

sh-4.4# printenv |grep NODE
NODE_INDEX=0
NODE_ADDRS=log-cache-z0-0:8080,log-cache-z0-1:8080,log-cache-z1-0:8080,log-cache-z1-1:8080

check file /var/vcap/jobs/log-cache/config/bpm.yml,

sh-4.4# cat bpm.yml |grep NODE_INDEX
    NODE_INDEX: "2"

check the log-cache logs

$ k logs log-cache-z1-0 -c log-cache-log-cache -n kubecf
2020/08/31 02:16:50 WARNING: proto: file "egress.proto" is already registered
A future release will panic on registration conflicts. See:
https://developers.google.com/protocol-buffers/docs/reference/go/faq#namespace-conflict

2020/08/31 02:16:50 WARNING: proto: file "ingress.proto" is already registered
A future release will panic on registration conflicts. See:
https://developers.google.com/protocol-buffers/docs/reference/go/faq#namespace-conflict

2020/08/31 02:16:50.027381 Starting Log Cache...
FIELD NAME:             TYPE:          ENV:                    REQUIRED:  VALUE:
Config.Addr             string         ADDR                    true       :8080
Config.QueryTimeout     time.Duration  QUERY_TIMEOUT           false      10s
Config.MemoryLimit      uint           MEMORY_LIMIT_PERCENT    false      50
Config.MaxPerSource     int            MAX_PER_SOURCE          false      100000
Config.NodeIndex        int            NODE_INDEX              false      0
Config.NodeAddrs        []string       NODE_ADDRS              false      [log-cache-z0-0:8080 log-cache-z0-1:8080 log-cache-z1-0:8080 log-cache-z1-1:8080]
TLS.CAPath              string         CA_PATH                 true       /var/vcap/jobs/log-cache/config/certs/ca.crt
TLS.CertPath            string         CERT_PATH               true       /var/vcap/jobs/log-cache/config/certs/log_cache.crt
TLS.KeyPath             string         KEY_PATH                true       /var/vcap/jobs/log-cache/config/certs/log_cache.key
MetricsServer.Port      uint16         METRICS_PORT            false      6060
MetricsServer.CAFile    string         METRICS_CA_FILE_PATH    false      /var/vcap/jobs/log-cache/config/certs/metrics_ca.crt
MetricsServer.CertFile  string         METRICS_CERT_FILE_PATH  false      /var/vcap/jobs/log-cache/config/certs/metrics.crt
MetricsServer.KeyFile   string         METRICS_KEY_FILE_PATH   false      /var/vcap/jobs/log-cache/config/certs/metrics.key
2020/08/31 02:16:50 Metrics endpoint is listening on [::]:6060

Check the statefulset log-cache in each zone:

$ k describe statefulset log-cache-z0 -n kubecf |grep NODE_INDEX
      NODE_INDEX:              0
$ k describe statefulset log-cache-z1 -n kubecf |grep NODE_INDEX
      NODE_INDEX:              0

NODE_INDEX is set as the Environment of container log-cache-log-cache

 Environment:
      ADDR:                    :8080
      CA_PATH:                 /var/vcap/jobs/log-cache/config/certs/ca.crt
      CERT_PATH:               /var/vcap/jobs/log-cache/config/certs/log_cache.crt
      KEY_PATH:                /var/vcap/jobs/log-cache/config/certs/log_cache.key
      MAX_PER_SOURCE:          100000
      MEMORY_LIMIT_PERCENT:    50
      METRICS_CA_FILE_PATH:    /var/vcap/jobs/log-cache/config/certs/metrics_ca.crt
      METRICS_CERT_FILE_PATH:  /var/vcap/jobs/log-cache/config/certs/metrics.crt
      METRICS_KEY_FILE_PATH:   /var/vcap/jobs/log-cache/config/certs/metrics.key
      METRICS_PORT:            6060
      NODE_ADDRS:              log-cache-z0-0:8080,log-cache-z0-1:8080,log-cache-z1-0:8080,log-cache-z1-1:8080
      NODE_INDEX:              0          
      QUERY_TIMEOUT:           10s
      KUBE_AZ:                 us-south-1
      BOSH_AZ:                 us-south-1
      CF_OPERATOR_AZ:          us-south-1
      AZ_INDEX:                1
      REPLICAS:                2

issue:
The log-cache job start reference to the container environment value NODE_INDEX=0 instead of using the value in bpm.yml, and this cause all the log-cache jobs run with NODE_INDEX=0 and fail to work as a cluster.

Expected behavior
logcach job can join the cluster with correct NODE_INDEX, the value in bpm.yml

Environment
cf-operator: 5.2.0
kubecf: v2.2.3

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions