-
Notifications
You must be signed in to change notification settings - Fork 178
Description
zot version
v2.1.13-0-g4ad3fad
Describe the bug
We are running zot 2.1.13 and we are periodically seeing pulls hanging, usually a buch at the same time. At worst no data is downloading and the open TCP connections eventually time out. All images to this server are pushed by an automated process, none are pulled by zot from other registries. The latency reported in the log for completed transfers generally gets quite long (e.g., >15min for a manifest), the connections that time out are not logged by zot as far as I can find even at trace level (am I missing something here?).
We do have Bearer auth configured and in the worst state the expected requests without an Authorization header get returned a 401 very quickly but the follow-up request with the token included is either slow or never appears due to timing out.
I need some suggestions on how to track down what zot is doing internally that might get it into this 'stuck' position. When it happened again this morning the outbound bandwidth pegged at maximum for a while after restarting the zot service but it handled the pent-up demand and eventually settled back down to normal levels. Memory usage of the zot process does gradually climb, this morning as things were falling over the utilization climbed from 1.5GB to 2.2GB, after the restart it settled at 0.3GB.
We have looked for things on the server that may be blocking zot but do not see any issues with CPU, RAM, disk I/O, etc. My gut says something internal to the service is happening here but I am not sure where to start ruling out possibilities.
To reproduce
- Configuration
{
"distSpecVersion": "1.1.1",
"storage": {
"rootDirectory": "/storage/zot",
"dedupe": "true",
"gc": "true",
"gcDelay": "1h",
"gcInterval": "24h"
},
"http": {
"address": "0.0.0.0",
"port": "8443",
"realm": "****",
"tls": {
"cert": "/etc/opt/zot/fullchain.pem",
"key": "/etc/opt/zot/privkey.pem"
},
"Ratelimit": {
"Rate": 1000
},
"auth": {
"failDelay": 5,
"bearer": {
"realm": "****",
"service": "****",
"cert": "/etc/zot/auth.crt"
}
}
},
"log": {
"level": "trace",
"output": "/var/log/zot/zot.log",
"audit": "/var/log/zot/zot-audit.log"
},
"extensions": {
"metrics": {
"enable": true,
"prometheus": {
"path": "/metrics"
}
},
"lint": {
"enable": false
},
"scrub": {
"enable": false
},
"search": {
"enable": false
},
"sync": {
"enable": false
},
"trust": {
"enable": false
},
"ui": {
"enable": false
}
}
}
- Client tool used
Our internal client built using oras-go for pulling images
- Seen error
No actual errors but downloading eventually stops completely and tcp sessions time out.
Expected behavior
No response
Screenshots
No response
Additional context
- zot v2.1.13-0-g4ad3fad
{"time":"2026-02-04T23:05:42.445833401Z","level":"info","message":"version","distribution-spec":"1.1.1","commit":"v2.1.13-0-g4ad3fad","binary-type":"-events-imagetrust-lint-metrics-mgmt-profile-scrub-search-sync-ui-userprefs","go version":"go1.25.5","caller":"zotregistry.dev/zot/v2/pkg/cli/server/root.go:220","func":"zotregistry.dev/zot/v2/pkg/cli/server.NewServerRootCmd.func1","goroutine":1} - 128GB RAM
- 2 x10Gb bonded NICs
- 66TB used on a local 556TB RAID (md) array
- bearer auth enabled
- prometheus metrics is the only enabled extension