Is there an existing issue for this?
Environment
- Milvus version:2.6.18
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka): woodpecker
- SDK version(e.g. pymilvus v2.0.0rc2):--
- OS(Ubuntu or CentOS): Ubuntu
- CPU/Memory:
- GPU:
- Others:
Current Behavior
我使用docker compose去创建了milvus负载,然后使用的是阿里云的oss。启动之后容器状态全部是up的,但实际使用的时候,数据始终无法真正落盘到oss中,查看日志是分配未找到pchannel=by-dev-rootcoord-dml_0,streamingcoord始终无法将pchannel分配给streamingnode,两个streamingnode都注册并显示组件已准备好。
Expected Behavior
集群各个节点之间成功协作能够正常运行,数据落盘到oss中
Steps To Reproduce
在ubuntu26.04版本里使用docker搭建了milvus集群,milvus版本2.6.18,etcd版本3.5.25。创建了一个2节点datanode,2节点querynode,2节点proxy,2节点datanode,2节点streamingnode,3节点etcd,1个mixcoord,存储连接阿里云的oss的这样的一个集群。启动之后连接oss没有问题,但是创建collection失败
user.yaml文件内容
mq:
type: woodpecker
woodpecker:
meta:
type: etcd
prefix: woodpecker
storage:
type: remote
rootPath: woodpecker2
logstore:
fencePolicy:
conditionWrite: disable
segmentSyncPolicy:
maxInterval: 200ms
maxFlushSize: 2M
maxFlushThreads: 16
etcd:
endpoints:
- etcd1:2379
- etcd2:2379
- etcd3:2379
minio:
address: xxx
port: 80
accessKeyID: xxx
secretAccessKey: xxxx
bucketName: xxx
rootPath: milvus
useSSL: false
cloudProvider: aliyun
tlsSkipVerify: true
common:
storageType: remote
security:
authorizationEnabled: false
defaultRootPassword: "xxx"
proxy:
maxUserCount: 100
maxRoleNum: 10
# 超时按业务调
# healthCheckTimeout: 3000
queryNode:
gracefulTime: 5000
mmap:
mmapEnabled: true
dataNode:
flush:
insertBufSize: 16777216
log:
level: info
docker-compose.yaml内容
version: "3.9"
x-milvus-env: &milvus-env
ETCD_ENDPOINTS: etcd1:2379,etcd2:2379,etcd3:2379
x-milvus-volumes: &milvus-volumes
- ./configs/user.yaml:/milvus/configs/user.yaml:ro
- /data/milvus-woodpecker:/milvus/woodpecker2
x-milvus-common: &milvus-common
image: milvusdb/milvus:v2.6.18
security_opt: [seccomp:unconfined]
environment: *milvus-env
volumes: *milvus-volumes
networks: [milvus]
extra_hosts:
- "xxx"
- "xxx"
ulimits:
memlock: { soft: -1, hard: -1 }
nofile: { soft: 65536, hard: 65536 }
logging:
driver: json-file
options:
max-size: "200m"
max-file: "5"
services:
# ---------- etcd 3 节点集群 ,下面是其中一个,其余两个都是一样的配置----------
etcdx:
image: quay.io/coreos/etcd:v3.5.25
container_name: milvus-etcd1
networks: [milvus]
restart: unless-stopped
volumes: [./volumes/etcd1:/etcd]
environment:
ETCD_NAME: etcd1
ETCD_INITIAL_CLUSTER: etcd1=http://etcd1:2380,etcd2=http://etcd2:2380,etcd3=http://etcd3:2380
ETCD_INITIAL_CLUSTER_STATE: new
ETCD_INITIAL_CLUSTER_TOKEN: milvus-etcd
ETCD_LISTEN_PEER_URLS: http://0.0.0.0:2380
ETCD_ADVERTISE_PEER_URLS: http://etcd1:2380
ETCD_LISTEN_CLIENT_URLS: http://0.0.0.0:2379
ETCD_ADVERTISE_CLIENT_URLS: http://etcd1:2379
ETCD_AUTO_COMPACTION_MODE: revision
ETCD_AUTO_COMPACTION_RETENTION: "1000"
ETCD_QUOTA_BACKEND_BYTES: "4294967296"
ETCD_HEARTBEAT_INTERVAL: 500
ETCD_ELECTION_TIMEOUT: 5000
healthcheck:
test: ["CMD", "etcdctl", "endpoint", "health"]
interval: 30s
timeout: 10s
retries: 5
deploy:
resources:
limits:
cpus: '2'
memory: 4G
reservations:
cpus: '1'
memory: 2G
# ---------- Milvus 控制面 ----------
mixcoord:
<<: *milvus-common
container_name: milvus-mixcoord
command: ["milvus", "run", "mixcoord"]
depends_on:
etcd1: { condition: service_healthy }
deploy:
resources:
limits:
cpus: '4'
memory: 8G
reservations:
cpus: '2'
memory: 4G
streamingnode-x:
<<: *milvus-common
container_name: milvus-streamingnode-x
command: ["milvus", "run", "streamingnode"]
depends_on: [mixcoord]
deploy:
resources:
limits: { cpus: '2', memory: 8G }
reservations: { cpus: '1', memory: 4G }
datanode-x:
<<: *milvus-common
container_name: milvus-datanode-x
command: ["milvus", "run", "datanode"]
depends_on: [mixcoord, streamingnode-x]
deploy:
resources:
limits: { cpus: '4', memory: 16G }
reservations: { cpus: '2', memory: 8G }
querynode-x:
<<: *milvus-common
container_name: milvus-querynode-x
command: ["milvus", "run", "querynode"]
depends_on: [mixcoord, streamingnode-x]
deploy:
resources:
limits: { cpus: '4', memory: 16G }
reservations: { cpus: '4', memory: 16G }
proxy-x:
<<: *milvus-common
container_name: milvus-proxy-x
command: ["milvus", "run", "proxy"]
depends_on: [mixcoord, streamingnode-1, streamingnode-2]
ports: ["19530:19530", "9091:9091"]
deploy:
resources:
limits: { cpus: '2', memory: 4G }
reservations: { cpus: '1', memory: 2G }
attu:
image: zilliz/attu:v2.6.0
container_name: attu
restart: unless-stopped
ports:
- "3000:3000"
environment:
MILVUS_URL: "xxx"
MILVUS_TOKEN: "xxx"
# MILVUS_USERNAME: "xxx"
# MILVUS_PASSWORD: "xxx"
extra_hosts:
- "xxx"
networks:
- milvus
networks:
milvus:
driver: bridge
Milvus Log
Mixcoord日志(持续循环,永不成功):
[WARN] [handler/handler_client_impl.go:301] ["assignment not found"] [pchannel=by-dev-rootcoord-dml_0] [handler=producer]
[INFO] [handler/handler_client_impl.go:313] ["wait for next backoff done"] [pchannel=by-dev-rootcoord-dml_0] [handler=producer] [isAssignmentChange=false] [cost=~10-14s]
[INFO] [rootcoord/root_coord.go:917] ["failed to create collection"] [collectionName=test_oss] [error="context canceled"]
Streamingnode-1 ,2启动日志(OSS连接正常):
[INFO] [CGO] [storage/MinioChunkManager.cpp:244] ["[SERVER][PreCheck] 开始预检查 chunk 管理器,配置为:[address=xxxx:80, bucket_name=xxx, cloud_provider=xxx, useVirtualHost=false]"]
[INFO] [CGO] [storage/ChunkManager.cpp:199] ["使用参数[endpoint=xxx:80][bucket_name=xxx]初始化 AliyunChunkManager"]
[INFO] [roles/roles.go:256] ["组件已准备就绪"] [role=streamingnode]
Anything else?
No response
Is there an existing issue for this?
Environment
Current Behavior
我使用docker compose去创建了milvus负载,然后使用的是阿里云的oss。启动之后容器状态全部是up的,但实际使用的时候,数据始终无法真正落盘到oss中,查看日志是分配未找到pchannel=by-dev-rootcoord-dml_0,streamingcoord始终无法将pchannel分配给streamingnode,两个streamingnode都注册并显示组件已准备好。
Expected Behavior
集群各个节点之间成功协作能够正常运行,数据落盘到oss中
Steps To Reproduce
在ubuntu26.04版本里使用docker搭建了milvus集群,milvus版本2.6.18,etcd版本3.5.25。创建了一个2节点datanode,2节点querynode,2节点proxy,2节点datanode,2节点streamingnode,3节点etcd,1个mixcoord,存储连接阿里云的oss的这样的一个集群。启动之后连接oss没有问题,但是创建collection失败 user.yaml文件内容 mq: type: woodpecker woodpecker: meta: type: etcd prefix: woodpecker storage: type: remote rootPath: woodpecker2 logstore: fencePolicy: conditionWrite: disable segmentSyncPolicy: maxInterval: 200ms maxFlushSize: 2M maxFlushThreads: 16 etcd: endpoints: - etcd1:2379 - etcd2:2379 - etcd3:2379 minio: address: xxx port: 80 accessKeyID: xxx secretAccessKey: xxxx bucketName: xxx rootPath: milvus useSSL: false cloudProvider: aliyun tlsSkipVerify: true common: storageType: remote security: authorizationEnabled: false defaultRootPassword: "xxx" proxy: maxUserCount: 100 maxRoleNum: 10 # 超时按业务调 # healthCheckTimeout: 3000 queryNode: gracefulTime: 5000 mmap: mmapEnabled: true dataNode: flush: insertBufSize: 16777216 log: level: info docker-compose.yaml内容 version: "3.9" x-milvus-env: &milvus-env ETCD_ENDPOINTS: etcd1:2379,etcd2:2379,etcd3:2379 x-milvus-volumes: &milvus-volumes - ./configs/user.yaml:/milvus/configs/user.yaml:ro - /data/milvus-woodpecker:/milvus/woodpecker2 x-milvus-common: &milvus-common image: milvusdb/milvus:v2.6.18 security_opt: [seccomp:unconfined] environment: *milvus-env volumes: *milvus-volumes networks: [milvus] extra_hosts: - "xxx" - "xxx" ulimits: memlock: { soft: -1, hard: -1 } nofile: { soft: 65536, hard: 65536 } logging: driver: json-file options: max-size: "200m" max-file: "5" services: # ---------- etcd 3 节点集群 ,下面是其中一个,其余两个都是一样的配置---------- etcdx: image: quay.io/coreos/etcd:v3.5.25 container_name: milvus-etcd1 networks: [milvus] restart: unless-stopped volumes: [./volumes/etcd1:/etcd] environment: ETCD_NAME: etcd1 ETCD_INITIAL_CLUSTER: etcd1=http://etcd1:2380,etcd2=http://etcd2:2380,etcd3=http://etcd3:2380 ETCD_INITIAL_CLUSTER_STATE: new ETCD_INITIAL_CLUSTER_TOKEN: milvus-etcd ETCD_LISTEN_PEER_URLS: http://0.0.0.0:2380 ETCD_ADVERTISE_PEER_URLS: http://etcd1:2380 ETCD_LISTEN_CLIENT_URLS: http://0.0.0.0:2379 ETCD_ADVERTISE_CLIENT_URLS: http://etcd1:2379 ETCD_AUTO_COMPACTION_MODE: revision ETCD_AUTO_COMPACTION_RETENTION: "1000" ETCD_QUOTA_BACKEND_BYTES: "4294967296" ETCD_HEARTBEAT_INTERVAL: 500 ETCD_ELECTION_TIMEOUT: 5000 healthcheck: test: ["CMD", "etcdctl", "endpoint", "health"] interval: 30s timeout: 10s retries: 5 deploy: resources: limits: cpus: '2' memory: 4G reservations: cpus: '1' memory: 2G # ---------- Milvus 控制面 ---------- mixcoord: <<: *milvus-common container_name: milvus-mixcoord command: ["milvus", "run", "mixcoord"] depends_on: etcd1: { condition: service_healthy } deploy: resources: limits: cpus: '4' memory: 8G reservations: cpus: '2' memory: 4G streamingnode-x: <<: *milvus-common container_name: milvus-streamingnode-x command: ["milvus", "run", "streamingnode"] depends_on: [mixcoord] deploy: resources: limits: { cpus: '2', memory: 8G } reservations: { cpus: '1', memory: 4G } datanode-x: <<: *milvus-common container_name: milvus-datanode-x command: ["milvus", "run", "datanode"] depends_on: [mixcoord, streamingnode-x] deploy: resources: limits: { cpus: '4', memory: 16G } reservations: { cpus: '2', memory: 8G } querynode-x: <<: *milvus-common container_name: milvus-querynode-x command: ["milvus", "run", "querynode"] depends_on: [mixcoord, streamingnode-x] deploy: resources: limits: { cpus: '4', memory: 16G } reservations: { cpus: '4', memory: 16G } proxy-x: <<: *milvus-common container_name: milvus-proxy-x command: ["milvus", "run", "proxy"] depends_on: [mixcoord, streamingnode-1, streamingnode-2] ports: ["19530:19530", "9091:9091"] deploy: resources: limits: { cpus: '2', memory: 4G } reservations: { cpus: '1', memory: 2G } attu: image: zilliz/attu:v2.6.0 container_name: attu restart: unless-stopped ports: - "3000:3000" environment: MILVUS_URL: "xxx" MILVUS_TOKEN: "xxx" # MILVUS_USERNAME: "xxx" # MILVUS_PASSWORD: "xxx" extra_hosts: - "xxx" networks: - milvus networks: milvus: driver: bridgeMilvus Log
Mixcoord日志(持续循环,永不成功):
[WARN] [handler/handler_client_impl.go:301] ["assignment not found"] [pchannel=by-dev-rootcoord-dml_0] [handler=producer]
[INFO] [handler/handler_client_impl.go:313] ["wait for next backoff done"] [pchannel=by-dev-rootcoord-dml_0] [handler=producer] [isAssignmentChange=false] [cost=~10-14s]
[INFO] [rootcoord/root_coord.go:917] ["failed to create collection"] [collectionName=test_oss] [error="context canceled"]
Streamingnode-1 ,2启动日志(OSS连接正常):
[INFO] [CGO] [storage/MinioChunkManager.cpp:244] ["[SERVER][PreCheck] 开始预检查 chunk 管理器,配置为:[address=xxxx:80, bucket_name=xxx, cloud_provider=xxx, useVirtualHost=false]"]
[INFO] [CGO] [storage/ChunkManager.cpp:199] ["使用参数[endpoint=xxx:80][bucket_name=xxx]初始化 AliyunChunkManager"]
[INFO] [roles/roles.go:256] ["组件已准备就绪"] [role=streamingnode]
Anything else?
No response