批次灰度配合hpa，灰度的数量并没有按照预期执行

**问题描述：**

我们的服务接入了hpa做了自动弹性扩缩，在使用rollouts做分批次的灰度发布时，第一批次灰度配置的灰度是1%，但是当自动弹性扩缩弹至28的时候，灰度数量并未按照预期灰度一个pod，总共起了5个灰度的pod。


服务编排：
````yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: ${APP_NAME}
  name: ${APP_NAME}
  namespace: pro-k8s
spec:
  progressDeadlineSeconds: 600
  replicas: ${POD_NUM}
  strategy:
    rollingUpdate:
      maxSurge: 6
      maxUnavailable: 0
    type: RollingUpdate
  selector:
    matchLabels:
      app: ${APP_NAME}
  template:
    metadata:
      annotations:
        alibabacloud.com/burst-resource: eci_only
        k8s.aliyun.com/eci-reschedule-enable: "true"
        k8s.aliyun.com/eci-extra-ephemeral-storage: "200Gi"
        k8s.aliyun.com/eci-use-specs: ecs.c6.4xlarge,ecs.ic5.6xlarge,ecs.hfc6.6xlarge
      labels:
        app: ${APP_NAME}
        armsPilotAutoEnable: "$ARMS_SWITCH"
        armsPilotCreateAppName: "$APP_NAME"
    spec:
      containers:
        - env:
            - name: API_ENV
              value: pro
            - name: AppName
              value: ${APP_NAME}
          image: server:$BUILD_TAG
          readinessProbe:
            httpGet:
              path: /health
              port: ${PORT}
            initialDelaySeconds: 180
            periodSeconds: 5
            timeoutSeconds: 3
            failureThreshold: 6
          livenessProbe:
            httpGet:
              path: /health
              port: ${PORT}
            initialDelaySeconds: 240
            periodSeconds: 10
            timeoutSeconds: 3
            failureThreshold: 18
          imagePullPolicy: Always
          name: ${APP_NAME}
          ports:
            - containerPort: ${PORT}
              protocol: TCP
          resources:
            limits:
              cpu: 16
              memory: 32Gi
            requests:
              cpu: 16
              memory: 32Gi
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
      dnsConfig:
        options:
        - name: ndots
          value: "3"
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      terminationGracePeriodSeconds: 600
````
灰度编排：
````yaml
apiVersion: rollouts.kruise.io/v1beta1
kind: Rollout
metadata:
  name: canary-${APP_NAME}
  namespace: pro-k8s
spec:
  workloadRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ${APP_NAME}
  strategy:
    canary:
      enableExtraWorkloadForCanary: false
      steps:
      - replicas: 1%
      - replicas: 50%
      - replicas: 100%
````
hpa编排：
````yaml
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: hpa-${APP_NAME}
  namespace: pro-k8s
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ${APP_NAME}
  minReplicas: 8
  maxReplicas: 28
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 30
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Pods
        value: 20
        periodSeconds: 15
    scaleDown:
      stabilizationWindowSeconds: 600
      policies:
      - type: Pods
        value: 4
        periodSeconds: 600
````

**预期结果：**
灰度策略配置是1%的话，拉起的28个pod中，也应该只有一个灰度的pod才没问题

**如何重现此问题（尽可能简洁准确）：**

- 目前还未很准确判断什么原因引起，早期在测试最大弹起20个pod的时候没出现，今天弹28出现次问题

环境：
当前版本：rollout:v0.6.1
k8s集群版本：1.22.15-aliyun.1
安装详情：默认 helm 安装


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

批次灰度配合hpa，灰度的数量并没有按照预期执行 #321

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

批次灰度配合hpa，灰度的数量并没有按照预期执行 #321

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions