Skip to content

[BUG] milvus topology cluster-with-dep stop start data loss #9767

@JashBook

Description

@JashBook

Describe the bug
A clear and concise description of what the bug is.

kbcli version    
Kubernetes: v1.30.4-vke.4
KubeBlocks: 0.9.6-beta.1
kbcli: 0.9.5

To Reproduce
Steps to reproduce the behavior:

  1. create cluster
apiVersion: v1
kind: Secret
type: Opaque
metadata:
  name: milvus-jrwvla-s3-credential
  namespace: default
stringData:
  accessKey: "kbclitest"
  bucketnames: "kb-milvus-jrwvla"
  endpoint: "http://kbcli-test-minio.kb-system.svc.cluster.local:9000"
  host: "kbcli-test-minio.kb-system.svc.cluster.local"
  port: "9000"
  region: ""
  rulerBucketnames: "kb-milvus-jrwvla"
  secretKey: "kbclitest"
  storageType: "s3"
---
apiVersion: apps.kubeblocks.io/v1alpha1
kind: Cluster
metadata:
  name: milvus-jrwvla
  namespace: default
spec:
  clusterDefinitionRef: milvus
  topology: cluster-with-dep
  terminationPolicy: Halt
  componentSpecs:
    - name: proxy
      serviceVersion: 2.5.13
      replicas: 1
      disableExporter: true
      resources:
        limits:
          cpu: 500m
          memory: 0.5Gi
        requests:
          cpu: 500m
          memory: 0.5Gi
      env:
        - name: MINIO_HOST
          valueFrom:
            secretKeyRef:
              key: host
              name: milvus-jrwvla-s3-credential
        - name: MINIO_PORT
          valueFrom:
            secretKeyRef:
              key: port
              name: milvus-jrwvla-s3-credential
        - name: MINIO_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              key: accessKey
              name: milvus-jrwvla-s3-credential
        - name: MINIO_SECRET_KEY
          valueFrom:
            secretKeyRef:
              key: secretKey
              name: milvus-jrwvla-s3-credential
        - name: MINIO_BUCKETNAME
          valueFrom:
            secretKeyRef:
              key: bucketnames
              name: milvus-jrwvla-s3-credential
    - name: datanode
      serviceVersion: 2.5.13
      replicas: 1
      disableExporter: true
      resources:
        limits:
          cpu: 500m
          memory: 0.5Gi
        requests:
          cpu: 500m
          memory: 0.5Gi
      env:
        - name: MINIO_HOST
          valueFrom:
            secretKeyRef:
              key: host
              name: milvus-jrwvla-s3-credential
        - name: MINIO_PORT
          valueFrom:
            secretKeyRef:
              key: port
              name: milvus-jrwvla-s3-credential
        - name: MINIO_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              key: accessKey
              name: milvus-jrwvla-s3-credential
        - name: MINIO_SECRET_KEY
          valueFrom:
            secretKeyRef:
              key: secretKey
              name: milvus-jrwvla-s3-credential
        - name: MINIO_BUCKETNAME
          valueFrom:
            secretKeyRef:
              key: bucketnames
              name: milvus-jrwvla-s3-credential
    - name: indexnode
      serviceVersion: 2.5.13
      replicas: 1
      disableExporter: true
      resources:
        limits:
          cpu: 500m
          memory: 0.5Gi
        requests:
          cpu: 500m
          memory: 0.5Gi
      env:
        - name: MINIO_HOST
          valueFrom:
            secretKeyRef:
              key: host
              name: milvus-jrwvla-s3-credential
        - name: MINIO_PORT
          valueFrom:
            secretKeyRef:
              key: port
              name: milvus-jrwvla-s3-credential
        - name: MINIO_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              key: accessKey
              name: milvus-jrwvla-s3-credential
        - name: MINIO_SECRET_KEY
          valueFrom:
            secretKeyRef:
              key: secretKey
              name: milvus-jrwvla-s3-credential
        - name: MINIO_BUCKETNAME
          valueFrom:
            secretKeyRef:
              key: bucketnames
              name: milvus-jrwvla-s3-credential
    - name: querynode
      serviceVersion: 2.5.13
      replicas: 1
      disableExporter: true
      resources:
        limits:
          cpu: 500m
          memory: 0.5Gi
        requests:
          cpu: 500m
          memory: 0.5Gi
      env:
        - name: MINIO_HOST
          valueFrom:
            secretKeyRef:
              key: host
              name: milvus-jrwvla-s3-credential
        - name: MINIO_PORT
          valueFrom:
            secretKeyRef:
              key: port
              name: milvus-jrwvla-s3-credential
        - name: MINIO_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              key: accessKey
              name: milvus-jrwvla-s3-credential
        - name: MINIO_SECRET_KEY
          valueFrom:
            secretKeyRef:
              key: secretKey
              name: milvus-jrwvla-s3-credential
        - name: MINIO_BUCKETNAME
          valueFrom:
            secretKeyRef:
              key: bucketnames
              name: milvus-jrwvla-s3-credential
    - name: etcd
      serviceVersion: 3.6.1
      replicas: 1
      disableExporter: true
      resources:
        limits:
          cpu: 500m
          memory: 0.5Gi
        requests:
          cpu: 500m
          memory: 0.5Gi
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName: 
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi
    - name: kafka
      serviceVersion: 3.3.2
      replicas: 1
      disableExporter: true
      resources:
        limits:
          cpu: 500m
          memory: 0.5Gi
        requests:
          cpu: 500m
          memory: 0.5Gi
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName: 
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi
    - name: mixcoord
      serviceVersion: 2.5.13
      replicas: 1
      disableExporter: true
      resources:
        limits:
          cpu: 500m
          memory: 0.5Gi
        requests:
          cpu: 500m
          memory: 0.5Gi
      env:
        - name: MINIO_HOST
          valueFrom:
            secretKeyRef:
              key: host
              name: milvus-jrwvla-s3-credential
        - name: MINIO_PORT
          valueFrom:
            secretKeyRef:
              key: port
              name: milvus-jrwvla-s3-credential
        - name: MINIO_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              key: accessKey
              name: milvus-jrwvla-s3-credential
        - name: MINIO_SECRET_KEY
          valueFrom:
            secretKeyRef:
              key: secretKey
              name: milvus-jrwvla-s3-credential
        - name: MINIO_BUCKETNAME
          valueFrom:
            secretKeyRef:
              key: bucketnames
              name: milvus-jrwvla-s3-credential
  1. insert data
kubectl create -f -<<EOF
apiVersion: v1
kind: Pod
metadata:
  name: test-db-client-executionloop-milvus-jrwvla
  namespace: default
spec:
  containers:
    - name: test-dbclient
      imagePullPolicy: IfNotPresent
      image: apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/dbclient:test
      args:
        - "--host"
        - "milvus-jrwvla-proxy.default.svc.cluster.local"
        - "--user"
        - ""
        - "--password"
        - ""
        - "--port"
        - "19530"
        - "--dbtype"
        - "milvus"
        - "--test"
        - "executionloop"
        - "--duration"
        - "60"
        - "--interval"
        - "1"
  restartPolicy: Never
EOF
kubectl logs -f test-db-client-executionloop-milvus-jrwvla
--host milvus-jrwvla-proxy.default.svc.cluster.local --user  --password  --port 19530 --dbtype milvus --test executionloop --duration 60 --interval 1
Collection executions_loop_collection already exists.
Delete collection executions_loop_collection
Collection executions_loop_collection deleted successfully.
Create collection executions_loop_collection
Collection executions_loop_collection created successfully.
Execution loop start: insert:executions_loop_collection:10::1:executions_loop_1
[ 1s ] executions total: 1 successful: 1 failed: 0 disconnect: 0
[ 2s ] executions total: 173 successful: 173 failed: 0 disconnect: 0
[ 3s ] executions total: 400 successful: 400 failed: 0 disconnect: 0
...
[ 58s ] executions total: 16211 successful: 16211 failed: 0 disconnect: 0
[ 59s ] executions total: 16509 successful: 16509 failed: 0 disconnect: 0
[ 60s ] executions total: 16571 successful: 16571 failed: 0 disconnect: 0
Test Result:
Total Executions: 16571
Successful Executions: 16571
Failed Executions: 0
Disconnection Counts: 0
echo "curl -s -H 'Content-Type: application/json' -X POST  http://milvus-jrwvla-proxy.default.svc.cluster.local:19530/v1/vector/query  -d '{\"collectionName\":\"executions_loop_collection\",\"filter\":\"id == 16571\",\"limit\":0,\"outputFields\":[\"id\"]}' " | kubectl exec -it milvus-jrwvla-proxy-0 --namespace default -- bash
{"code":200,"data":[{"id":16571}]}
  1. stop start
kbcli cluster stop milvus-jrwvla --auto-approve 
OpsRequest milvus-jrwvla-stop-c9kjx created successfully, you can view the progress:
	kbcli cluster describe-ops milvus-jrwvla-stop-c9kjx -n default

 kubectl get cluster 
NAME            CLUSTER-DEFINITION   VERSION   TERMINATION-POLICY   STATUS    AGE
milvus-jrwvla   milvus                         WipeOut              Stopped   45m

kbcli cluster start milvus-jrwvla              
OpsRequest milvus-jrwvla-start-6h82g created successfully, you can view the progress:
	kbcli cluster describe-ops milvus-jrwvla-start-6h82g -n default

kubectl get cluster milvus-jrwvla 
NAME            CLUSTER-DEFINITION   VERSION   TERMINATION-POLICY   STATUS    AGE
milvus-jrwvla   milvus                         WipeOut              Running   49m
  1. See error
echo "curl -s -H 'Content-Type: application/json' -X POST  http://milvus-jrwvla-proxy.default.svc.cluster.local:19530/v1/vector/query  -d '{\"collectionName\":\"executions_loop_collection\",\"filter\":\"id == 16571\",\"limit\":0,\"outputFields\":[\"id\"]}' " | kubectl exec -it milvus-jrwvla-proxy-0 --namespace default -- bash
{"code":200,"data":[]}

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

Labels

kind/bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions