Describe the bug
we deployed 3 learner alpha pods.
Sometime one of learner alpha is unable to accept query requests and returns the following error: "Please retry again, server is not ready to accept requests"."
and /health?live=1 returns 200, but health returns 503.
http://localhost:8080/health, 503 Service Unavailable, "Please retry again, server is not ready to accept requests"
http://localhost:8080/health?live=1, 200
[
{
"instance": "alpha",
"address": "dgraph-learner-alpha-0.dgraph-learner-alpha-headless.svc.cluster.local:7080",
"status": "healthy",
"group": "1",
"version": "v25.3.1",
"uptime": 285560,
"lastEcho": 1777346016,
"ee_features": [
"backup_restore",
"cdc"
],
"max_assigned": 2206421499
}
]
http://localhost:8080/probe/graphql , 200
{
"status": "up",
"schemaUpdateCounter": 0
}
alpha learner logs
E0428 03:09:49.515381 1 draft.go:1458] While retrieving snapshot, error: cannot retrieve snapshot from peer: rpc error: code = Unknown desc = operation opSnapshot is already running. Retrying...
I0428 03:09:49.515447 1 draft.go:124] Operation completed with id: opSnapshot
I0428 03:09:50.505477 1 draft.go:198] Operation started with id: opRollup
I0428 03:09:50.515906 1 draft.go:198] Operation started with id: opSnapshot
I0428 03:09:50.515973 1 draft.go:124] Operation completed with id: opRollup
E0428 03:09:50.516338 1 draft.go:1458] While retrieving snapshot, error: cannot retrieve snapshot from peer: rpc error: code = Unknown desc = operation opSnapshot is already running. Retrying...
I0428 03:09:50.516390 1 draft.go:124] Operation completed with id: opSnapshot
I0428 03:09:51.506878 1 draft.go:198] Operation started with id: opRollup
I0428 03:09:51.517209 1 draft.go:198] Operation started with id: opSnapshot
I0428 03:09:51.517313 1 draft.go:124] Operation completed with id: opRollup
E0428 03:09:51.517723 1 draft.go:1458] While retrieving snapshot, error: cannot retrieve snapshot from peer: rpc error: code = Unknown desc = operation opSnapshot is already running. Retrying...
I0428 03:09:51.517794 1 draft.go:124] Operation completed with id: opSnapshot
I0428 03:09:52.508172 1 draft.go:198] Operation started with id: opRollup
I0428 03:09:52.518652 1 draft.go:198] Operation started with id: opSnapshot
I0428 03:09:52.518756 1 draft.go:124] Operation completed with id: opRollup
E0428 03:09:52.519080 1 draft.go:1458] While retrieving snapshot, error: cannot retrieve snapshot from peer: rpc error: code = Unknown desc = operation opSnapshot is already running. Retrying...
I0428 03:09:52.519102 1 draft.go:124] Operation completed with id: opSnapshot
I0428 03:09:53.509970 1 draft.go:198] Operation started with id: opRollup
I0428 03:09:53.519332 1 draft.go:198] Operation started with id: opSnapshot
I0428 03:09:53.519421 1 draft.go:124] Operation completed with id: opRollup
E0428 03:09:53.519768 1 draft.go:1458] While retrieving snapshot, error: cannot retrieve snapshot from peer: rpc error: code = Unknown desc = operation opSnapshot is already running. Retrying...
I0428 03:09:53.519795 1 draft.go:124] Operation completed with id: opSnapshot
I0428 03:09:54.510889 1 draft.go:198] Operation started with id: opRollup
Expected behavior
alpha works normally
Screenshots
If applicable, add screenshots to help explain your problem.
Environment
dgraph v25.3.1 with learner alpha
Additional context
Add any other context about the problem here.
Describe the bug
we deployed 3 learner alpha pods.
Sometime one of learner alpha is unable to accept query requests and returns the following error: "Please retry again, server is not ready to accept requests"."
and /health?live=1 returns 200, but health returns 503.
http://localhost:8080/health, 503 Service Unavailable, "Please retry again, server is not ready to accept requests"
http://localhost:8080/health?live=1, 200
http://localhost:8080/probe/graphql , 200
alpha learner logs
Expected behavior
alpha works normally
Screenshots
If applicable, add screenshots to help explain your problem.
Environment
dgraph v25.3.1 with learner alpha
Additional context
Add any other context about the problem here.