Skip to content

[Bug]: mooncake master高可用模式,通过kill -19 etcd实例切主,mooncake master异常 #2253

@ScutYangLin

Description

@ScutYangLin

Bug Report

本次版本是Mooncake0.3.7。不知道后续版本还有没有这个问题?
1、启动etcd,高可用模式,3个实例。假设分别为etcd1、etcd2、etcd3
2、启动mooncake master实例
3、etcdctl get "mooncake-store/master_view" --prefix --endpoints=xxxx:port1,xxxx:port2,xxxx:port3
正常返回:
mooncake-store/master_view
${mooncake_master_localhost}:56050
4、kill -19 etcd1, 其中etcd1为主etcd实例。etcd正常切主
5、etcdctl get "mooncake-store/master_view" --prefix --endpoints=xxxx:port1,xxxx:port2,xxxx:port3
返回为空,异常。
6、kill -18 etcd1,恢复成3个etcd实例,访问正常。

上述操作中,用kill -9 etcd1替代kill -19 etcd1,mooncake master是正常的。说明,kill -19场景,只是进程挂起,但tcp连接还在,mooncake master无法感知这种异常,导致无法重连、选主。

Req=Success/PartialSuccess/Total, Item=Success/Total): PutStart:(Req=0/0/0, Item=0/0), PutEnd:(Req=0/0/0, Item=0/0), PutRevoke:(Req=0/0/0, Item=0/0), Get:(Req=0/0/0, Item=0/0), ExistKey:(Req=0/0/0, Item=0/0), | Eviction: Success/Attempts=0/0, keys=0, size=0 B
E20260528 17:58:58.306107 2881 etcd_helper.cpp:135] lease_id=5262630354244327913, error=keep alive channel closed
I20260528 17:58:58.306192 2881 ha_helper.cpp:141] Trying to stop server...
E20260528 17:58:58.306311 2521 ha_helper.cpp:176] Master service stopped: 1
E20260528 17:58:58.306394 2521 etcd_helper.cpp:149] Failed to cancel keep lease: no keep alive context found for the given lease ID
I20260528 17:58:58.306432 2521 ha_helper.cpp:182] Cancel keep leader alive: ETCD_OPERATION_ERROR
I20260528 17:58:59.202612 2521 ha_helper.cpp:111] Init master service...
I20260528 17:58:59.202684 2521 ha_helper.cpp:120] Reset master metric manager...
I20260528 17:58:59.202704 2521 ha_helper.cpp:123] Init leader election helper...
I20260528 17:58:59.202710 2521 ha_helper.cpp:130] Trying to elect self as leader...
E20260528 17:58:59.203328 2521 etcd_helper.cpp:52] key=mooncake-store/master_view, error=key not found in etcd
I20260528 17:58:59.203394 2521 ha_helper.cpp:46] No leader found, trying to elect self as leader
E20260528 17:59:04.205384 2521 etcd_helper.cpp:83] key=mooncake-store/master_view, lease_id=5262630354244327916, error=context deadline exceeded
E20260528 17:59:04.205427 2521 ha_helper.cpp:68] Failed to create key with lease: ETCD_OPERATION_ERROR
E20260528 17:59:05.206068 2521 etcd_helper.cpp:52] key=mooncake-store/master_view, error=key not found in etcd

Before submitting...

  • Ensure you searched for relevant issues and read the [documentation]

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions