问题描述:
我们的服务接入了hpa做了自动弹性扩缩,在使用rollouts做分批次的灰度发布时,第一批次灰度配置的灰度是1%,但是当自动弹性扩缩弹至28的时候,灰度数量并未按照预期灰度一个pod,总共起了5个灰度的pod。
服务编排:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: ${APP_NAME}
name: ${APP_NAME}
namespace: pro-k8s
spec:
progressDeadlineSeconds: 600
replicas: ${POD_NUM}
strategy:
rollingUpdate:
maxSurge: 6
maxUnavailable: 0
type: RollingUpdate
selector:
matchLabels:
app: ${APP_NAME}
template:
metadata:
annotations:
alibabacloud.com/burst-resource: eci_only
k8s.aliyun.com/eci-reschedule-enable: "true"
k8s.aliyun.com/eci-extra-ephemeral-storage: "200Gi"
k8s.aliyun.com/eci-use-specs: ecs.c6.4xlarge,ecs.ic5.6xlarge,ecs.hfc6.6xlarge
labels:
app: ${APP_NAME}
armsPilotAutoEnable: "$ARMS_SWITCH"
armsPilotCreateAppName: "$APP_NAME"
spec:
containers:
- env:
- name: API_ENV
value: pro
- name: AppName
value: ${APP_NAME}
image: server:$BUILD_TAG
readinessProbe:
httpGet:
path: /health
port: ${PORT}
initialDelaySeconds: 180
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 6
livenessProbe:
httpGet:
path: /health
port: ${PORT}
initialDelaySeconds: 240
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 18
imagePullPolicy: Always
name: ${APP_NAME}
ports:
- containerPort: ${PORT}
protocol: TCP
resources:
limits:
cpu: 16
memory: 32Gi
requests:
cpu: 16
memory: 32Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsConfig:
options:
- name: ndots
value: "3"
dnsPolicy: ClusterFirst
restartPolicy: Always
terminationGracePeriodSeconds: 600
灰度编排:
apiVersion: rollouts.kruise.io/v1beta1
kind: Rollout
metadata:
name: canary-${APP_NAME}
namespace: pro-k8s
spec:
workloadRef:
apiVersion: apps/v1
kind: Deployment
name: ${APP_NAME}
strategy:
canary:
enableExtraWorkloadForCanary: false
steps:
- replicas: 1%
- replicas: 50%
- replicas: 100%
hpa编排:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: hpa-${APP_NAME}
namespace: pro-k8s
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ${APP_NAME}
minReplicas: 8
maxReplicas: 28
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 30
behavior:
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Pods
value: 20
periodSeconds: 15
scaleDown:
stabilizationWindowSeconds: 600
policies:
- type: Pods
value: 4
periodSeconds: 600
预期结果:
灰度策略配置是1%的话,拉起的28个pod中,也应该只有一个灰度的pod才没问题
如何重现此问题(尽可能简洁准确):
- 目前还未很准确判断什么原因引起,早期在测试最大弹起20个pod的时候没出现,今天弹28出现次问题
环境:
当前版本:rollout:v0.6.1
k8s集群版本:1.22.15-aliyun.1
安装详情:默认 helm 安装
问题描述:
我们的服务接入了hpa做了自动弹性扩缩,在使用rollouts做分批次的灰度发布时,第一批次灰度配置的灰度是1%,但是当自动弹性扩缩弹至28的时候,灰度数量并未按照预期灰度一个pod,总共起了5个灰度的pod。
服务编排:
灰度编排:
hpa编排:
预期结果:
灰度策略配置是1%的话,拉起的28个pod中,也应该只有一个灰度的pod才没问题
如何重现此问题(尽可能简洁准确):
环境:
当前版本:rollout:v0.6.1
k8s集群版本:1.22.15-aliyun.1
安装详情:默认 helm 安装