Skip to content

v25.3.1, alpha errors "Please retry again, server is not ready to accept requests" #9695

@hankhsutc

Description

@hankhsutc

Describe the bug

we deployed 3 learner alpha pods.
Sometime one of learner alpha is unable to accept query requests and returns the following error: "Please retry again, server is not ready to accept requests"."
and /health?live=1 returns 200, but health returns 503.

http://localhost:8080/health, 503 Service Unavailable, "Please retry again, server is not ready to accept requests"
http://localhost:8080/health?live=1, 200

[
    {
        "instance": "alpha",
        "address": "dgraph-learner-alpha-0.dgraph-learner-alpha-headless.svc.cluster.local:7080",
        "status": "healthy",
        "group": "1",
        "version": "v25.3.1",
        "uptime": 285560,
        "lastEcho": 1777346016,
        "ee_features": [
            "backup_restore",
            "cdc"
        ],
        "max_assigned": 2206421499
    }
]

http://localhost:8080/probe/graphql , 200

{
    "status": "up",
    "schemaUpdateCounter": 0
}

alpha learner logs

E0428 03:09:49.515381       1 draft.go:1458] While retrieving snapshot, error: cannot retrieve snapshot from peer: rpc error: code = Unknown desc = operation opSnapshot is already running. Retrying...
I0428 03:09:49.515447       1 draft.go:124] Operation completed with id: opSnapshot
I0428 03:09:50.505477       1 draft.go:198] Operation started with id: opRollup
I0428 03:09:50.515906       1 draft.go:198] Operation started with id: opSnapshot
I0428 03:09:50.515973       1 draft.go:124] Operation completed with id: opRollup
E0428 03:09:50.516338       1 draft.go:1458] While retrieving snapshot, error: cannot retrieve snapshot from peer: rpc error: code = Unknown desc = operation opSnapshot is already running. Retrying...
I0428 03:09:50.516390       1 draft.go:124] Operation completed with id: opSnapshot
I0428 03:09:51.506878       1 draft.go:198] Operation started with id: opRollup
I0428 03:09:51.517209       1 draft.go:198] Operation started with id: opSnapshot
I0428 03:09:51.517313       1 draft.go:124] Operation completed with id: opRollup
E0428 03:09:51.517723       1 draft.go:1458] While retrieving snapshot, error: cannot retrieve snapshot from peer: rpc error: code = Unknown desc = operation opSnapshot is already running. Retrying...
I0428 03:09:51.517794       1 draft.go:124] Operation completed with id: opSnapshot
I0428 03:09:52.508172       1 draft.go:198] Operation started with id: opRollup
I0428 03:09:52.518652       1 draft.go:198] Operation started with id: opSnapshot
I0428 03:09:52.518756       1 draft.go:124] Operation completed with id: opRollup
E0428 03:09:52.519080       1 draft.go:1458] While retrieving snapshot, error: cannot retrieve snapshot from peer: rpc error: code = Unknown desc = operation opSnapshot is already running. Retrying...
I0428 03:09:52.519102       1 draft.go:124] Operation completed with id: opSnapshot
I0428 03:09:53.509970       1 draft.go:198] Operation started with id: opRollup
I0428 03:09:53.519332       1 draft.go:198] Operation started with id: opSnapshot
I0428 03:09:53.519421       1 draft.go:124] Operation completed with id: opRollup
E0428 03:09:53.519768       1 draft.go:1458] While retrieving snapshot, error: cannot retrieve snapshot from peer: rpc error: code = Unknown desc = operation opSnapshot is already running. Retrying...
I0428 03:09:53.519795       1 draft.go:124] Operation completed with id: opSnapshot
I0428 03:09:54.510889       1 draft.go:198] Operation started with id: opRollup

Expected behavior

alpha works normally

Screenshots

If applicable, add screenshots to help explain your problem.

Environment

dgraph v25.3.1 with learner alpha

Additional context

Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions