Skip to content

[Bug]: streamingcoord无法将pchannel分配给streamingnode #50423

@ye-ling-ye

Description

@ye-ling-ye

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:2.6.18
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):  woodpecker  
- SDK version(e.g. pymilvus v2.0.0rc2):--
- OS(Ubuntu or CentOS): Ubuntu
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

我使用docker compose去创建了milvus负载,然后使用的是阿里云的oss。启动之后容器状态全部是up的,但实际使用的时候,数据始终无法真正落盘到oss中,查看日志是分配未找到pchannel=by-dev-rootcoord-dml_0,streamingcoord始终无法将pchannel分配给streamingnode,两个streamingnode都注册并显示组件已准备好。

Expected Behavior

集群各个节点之间成功协作能够正常运行,数据落盘到oss中

Steps To Reproduce

在ubuntu26.04版本里使用docker搭建了milvus集群,milvus版本2.6.18,etcd版本3.5.25。创建了一个2节点datanode,2节点querynode,2节点proxy,2节点datanode,2节点streamingnode,3节点etcd,1个mixcoord,存储连接阿里云的oss的这样的一个集群。启动之后连接oss没有问题,但是创建collection失败

user.yaml文件内容
mq:
  type: woodpecker

woodpecker:
  meta:
    type: etcd
    prefix: woodpecker
  storage:
    type: remote
    rootPath: woodpecker2
  logstore:
    fencePolicy:
      conditionWrite: disable
    segmentSyncPolicy:
      maxInterval: 200ms
      maxFlushSize: 2M
      maxFlushThreads: 16
   

etcd:
  endpoints:
    - etcd1:2379
    - etcd2:2379
    - etcd3:2379

minio:
  address: xxx
  port: 80
  accessKeyID: xxx
  secretAccessKey: xxxx
  bucketName: xxx
  rootPath: milvus
  useSSL: false
  cloudProvider: aliyun
  tlsSkipVerify: true

common:
  storageType: remote
  security:
    authorizationEnabled: false
    defaultRootPassword: "xxx"

proxy:
  maxUserCount: 100
  maxRoleNum: 10
  # 超时按业务调
  # healthCheckTimeout: 3000
queryNode:
  gracefulTime: 5000
  mmap:
    mmapEnabled: true          
dataNode:
  flush:
    insertBufSize: 16777216    
log:
  level: info


docker-compose.yaml内容
version: "3.9"

x-milvus-env: &milvus-env
  ETCD_ENDPOINTS: etcd1:2379,etcd2:2379,etcd3:2379

x-milvus-volumes: &milvus-volumes
  - ./configs/user.yaml:/milvus/configs/user.yaml:ro
  - /data/milvus-woodpecker:/milvus/woodpecker2

x-milvus-common: &milvus-common
  image: milvusdb/milvus:v2.6.18
  security_opt: [seccomp:unconfined]
  environment: *milvus-env
  volumes: *milvus-volumes
  networks: [milvus]
  extra_hosts:
    - "xxx"
    - "xxx"
  ulimits:
    memlock: { soft: -1, hard: -1 }
    nofile: { soft: 65536, hard: 65536 }
  logging:
    driver: json-file
    options:
      max-size: "200m"
      max-file: "5"

services:
  # ---------- etcd 3 节点集群 ,下面是其中一个,其余两个都是一样的配置----------
  etcdx:
    image: quay.io/coreos/etcd:v3.5.25
    container_name: milvus-etcd1
    networks: [milvus]
    restart: unless-stopped
    volumes: [./volumes/etcd1:/etcd]
    environment:
      ETCD_NAME: etcd1
      ETCD_INITIAL_CLUSTER: etcd1=http://etcd1:2380,etcd2=http://etcd2:2380,etcd3=http://etcd3:2380
      ETCD_INITIAL_CLUSTER_STATE: new
      ETCD_INITIAL_CLUSTER_TOKEN: milvus-etcd
      ETCD_LISTEN_PEER_URLS: http://0.0.0.0:2380
      ETCD_ADVERTISE_PEER_URLS: http://etcd1:2380
      ETCD_LISTEN_CLIENT_URLS: http://0.0.0.0:2379
      ETCD_ADVERTISE_CLIENT_URLS: http://etcd1:2379
      ETCD_AUTO_COMPACTION_MODE: revision
      ETCD_AUTO_COMPACTION_RETENTION: "1000"
      ETCD_QUOTA_BACKEND_BYTES: "4294967296"
      ETCD_HEARTBEAT_INTERVAL: 500
      ETCD_ELECTION_TIMEOUT: 5000
    healthcheck:
      test: ["CMD", "etcdctl", "endpoint", "health"]
      interval: 30s
      timeout: 10s
      retries: 5
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 4G
        reservations:
          cpus: '1'
          memory: 2G
     
  # ---------- Milvus 控制面 ----------
  mixcoord:
    <<: *milvus-common
    container_name: milvus-mixcoord
    command: ["milvus", "run", "mixcoord"]
    depends_on:
      etcd1: { condition: service_healthy }
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 8G
        reservations:
          cpus: '2'
          memory: 4G

  streamingnode-x:
    <<: *milvus-common
    container_name: milvus-streamingnode-x
    command: ["milvus", "run", "streamingnode"]
    depends_on: [mixcoord]
    deploy:
      resources:
        limits: { cpus: '2', memory: 8G }
        reservations: { cpus: '1', memory: 4G }

  datanode-x:
    <<: *milvus-common
    container_name: milvus-datanode-x
    command: ["milvus", "run", "datanode"]
    depends_on: [mixcoord, streamingnode-x]
    deploy:
      resources:
        limits: { cpus: '4', memory: 16G }
        reservations: { cpus: '2', memory: 8G }

  querynode-x:
    <<: *milvus-common
    container_name: milvus-querynode-x
    command: ["milvus", "run", "querynode"]
    depends_on: [mixcoord, streamingnode-x]
    deploy:
      resources:
        limits: { cpus: '4', memory: 16G }
        reservations: { cpus: '4', memory: 16G }

  proxy-x:
    <<: *milvus-common
    container_name: milvus-proxy-x
    command: ["milvus", "run", "proxy"]
    depends_on: [mixcoord, streamingnode-1, streamingnode-2]
    ports: ["19530:19530", "9091:9091"]
    deploy:
      resources:
        limits: { cpus: '2', memory: 4G }
        reservations: { cpus: '1', memory: 2G }

  attu:
    image: zilliz/attu:v2.6.0
    container_name: attu
    restart: unless-stopped
    ports:
      - "3000:3000"
    environment:
      MILVUS_URL: "xxx"       
      MILVUS_TOKEN: "xxx" 
      # MILVUS_USERNAME: "xxx"     
      # MILVUS_PASSWORD: "xxx"          
    extra_hosts:
      - "xxx"
    networks:
      - milvus                       
networks:
  milvus:
    driver: bridge

Milvus Log

Mixcoord日志(持续循环,永不成功):
[WARN] [handler/handler_client_impl.go:301] ["assignment not found"] [pchannel=by-dev-rootcoord-dml_0] [handler=producer]
[INFO] [handler/handler_client_impl.go:313] ["wait for next backoff done"] [pchannel=by-dev-rootcoord-dml_0] [handler=producer] [isAssignmentChange=false] [cost=~10-14s]
[INFO] [rootcoord/root_coord.go:917] ["failed to create collection"] [collectionName=test_oss] [error="context canceled"]

Streamingnode-1 ,2启动日志(OSS连接正常):

[INFO] [CGO] [storage/MinioChunkManager.cpp:244] ["[SERVER][PreCheck] 开始预检查 chunk 管理器,配置为:[address=xxxx:80, bucket_name=xxx, cloud_provider=xxx, useVirtualHost=false]"]
[INFO] [CGO] [storage/ChunkManager.cpp:199] ["使用参数[endpoint=xxx:80][bucket_name=xxx]初始化 AliyunChunkManager"]
[INFO] [roles/roles.go:256] ["组件已准备就绪"] [role=streamingnode]

Anything else?

No response

Metadata

Metadata

Assignees

Labels

kind/bugIssues or changes related a bugneeds-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions