Skip to content

feat(storage): enable parallel writes by using per-repo and per-digest locking #2968

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

andaaron
Copy link
Contributor

@andaaron andaaron commented Feb 15, 2025

Should fix issues such as #2964

  • lock per repo on pushes/pulls/retention, in short index operations
  • lock per digest when using multiple operations affecting the cachedb and storage
    (blob writes/deletes/moves/links in storage which need to be in accordance with cachedb content)

Do not lock multiple repos at the same time in the same goroutine! It will cause deadlocks.
Same applies to digests.

In separate commits:

  • show more error information in zb output
  • gc stress tests to save logs as artifacts

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link

codecov bot commented Feb 15, 2025

Codecov Report

Attention: Patch coverage is 92.50720% with 52 lines in your changes missing coverage. Please review.

Project coverage is 90.95%. Comparing base (cb9b828) to head (ab9b00f).

Files with missing lines Patch % Lines
pkg/storage/imagestore/imagestore.go 91.13% 31 Missing and 8 partials ⚠️
pkg/test/oci-utils/oci_layout.go 74.00% 10 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2968      +/-   ##
==========================================
- Coverage   90.99%   90.95%   -0.05%     
==========================================
  Files         177      178       +1     
  Lines       32940    32998      +58     
==========================================
+ Hits        29975    30014      +39     
- Misses       2238     2253      +15     
- Partials      727      731       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andaaron andaaron changed the title eat(storage): enable parallel writes by using per-repo locking feat(storage): enable parallel writes by using per-repo locking Feb 15, 2025
@andaaron andaaron force-pushed the storage2 branch 7 times, most recently from 21c56e6 to a83648b Compare February 22, 2025 13:12
@andaaron andaaron force-pushed the storage2 branch 4 times, most recently from b1540db to d5b8180 Compare February 23, 2025 08:51
@andaaron andaaron changed the title feat(storage): enable parallel writes by using per-repo locking feat(storage): enable parallel writes by using per-repo and per-digest locking Feb 23, 2025
@andaaron andaaron marked this pull request as ready for review February 23, 2025 09:36
@rchincha
Copy link
Contributor

Looks reasonable.

Are there actual benchmark numbers that show improvement under contention?

@andaaron andaaron force-pushed the storage2 branch 4 times, most recently from 6f48c6d to cd30d2d Compare February 27, 2025 13:38
@andaaron
Copy link
Contributor Author

Looks reasonable.

Are there actual benchmark numbers that show improvement under contention?

I have updated the tests using the benchmark HG action to post the comparison in the job summary:
https://github.com/project-zot/zot/actions/runs/13567724400?pr=2968

And:
https://github.com/project-zot/zot/actions/runs/13567724423?pr=2968

The results for minio are almost unchanged.
For local storage we see some improved performance for pushes, and some decreases for pulling (and there is some variations between runs).
I think we should see in a more realistic environment.

@andaaron
Copy link
Contributor Author

I ran zb locally with config (note GC and dedupe are enabled):

{
    "distSpecVersion": "1.1.0-dev",
    "storage": {
        "rootDirectory": "/data/hdd/zot",
        "dedupe": true,
        "gc": true
    },
    "http": {
        "address": "0.0.0.0",
        "port": "8080",
        "realm": "zot",
        "tls": {
          "cert": "/data/ssd/cert/server.cert",
          "key": "/data/ssd/cert/server.key"
        }
    },
    "log": {
        "level": "debug",
        "output": "/data/hdd/zot.log"
    },
    "extensions": {
        "search": {
            "enable": true,
            "cve": {
                "updateInterval": "2h"
            }
        },
        "metrics": {
            "enable": true
        },
        "ui": {
            "enable": true
        },
        "mgmt": {
            "enable": true
        },
        "trust": {
            "enable": true,
            "cosign": true,
            "notation": true
        }
    }
}

And zb command: bin/zb-linux-amd64 -c 100 -n 1000 -o stdout https://localhost:8080

This is the result with the code in #2996 (just some ci improvements) zb_diff.md
This is the result with the code in this PR (includes lock changes) zb_locks.md
This is the diff of the files above zb_ref.md

@andaaron
Copy link
Contributor Author

andaaron commented Mar 11, 2025

@shcherbak, can you maybe also use the -B option for grep? We're interested in potential errors showing before the 500 code is returned.

Note I fixed some of the issues with visualizing the docker images in the UI, and merged that on main before rebasing this PR.

@shcherbak
Copy link

shcherbak commented Mar 11, 2025

@shcherbak, can you maybe also use the -B option for grep? We're interested in potential errors showing before the 500 code is returned.

Note I fixed some of the issues with visualizing the docker images in the UI, and merged that on main before rebasing this PR.

{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":GITLABUSER,"component":"session","clientIP":"10.12.18.51:33340","method":"PATCH","path":"/v2/REPO1/blobs/uploads/97b3504a-f280-422f-9664-3dad1667f028","statusCode":202,"latency":"4s","bodySize":0,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["17344948"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["RENNER1"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["RENNER1"],"X-Zot-Cluster-Hop-Count":["1"]},"goroutine":7163125,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:02:45.188078158Z","message":"HTTP API"}
{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":GITLABUSER,"component":"session","clientIP":"10.12.18.51:33356","method":"PATCH","path":"/v2/REPO1/blobs/uploads/fe944474-9225-491b-bd1b-f6e06c90c5b9","statusCode":202,"latency":"5s","bodySize":0,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["20750132"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["RENNER1"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["RENNER1"],"X-Zot-Cluster-Hop-Count":["1"]},"goroutine":7163177,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:02:47.197397561Z","message":"HTTP API"}
{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":GITLABUSER,"component":"session","clientIP":"10.12.18.51:20012","method":"PATCH","path":"/v2/REPO1/blobs/uploads/a5dd54ba-ec48-478b-96d2-7826231043a7","statusCode":202,"latency":"0s","bodySize":0,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["22329"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["RENNER1"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["RENNER1"],"X-Zot-Cluster-Hop-Count":["1"]},"goroutine":7166094,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:03:01.629020563Z","message":"HTTP API"}
{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":GITLABUSER,"component":"session","clientIP":"10.12.18.53:38234","method":"PATCH","path":"/v2/REPO2/blobs/uploads/be723d7d-d6f3-466e-9f25-0e07ec499eff","statusCode":202,"latency":"2s","bodySize":0,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["15777213"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["RENNER2"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["RENNER2"],"X-Zot-Cluster-Hop-Count":["1"]},"goroutine":7170018,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:04:12.453675456Z","message":"HTTP API"}
{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":GITLABUSER,"component":"session","clientIP":"10.12.18.53:23072","method":"PATCH","path":"/v2/REPO2/blobs/uploads/4db044db-5f06-4321-8510-6437cbe42446","statusCode":202,"latency":"0s","bodySize":0,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["9987"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["RENNER2"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["RENNER2"],"X-Zot-Cluster-Hop-Count":["1"]},"goroutine":7170182,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:04:15.91019342Z","message":"HTTP API"}
{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":GITLABUSER,"component":"session","clientIP":"10.12.18.53:37148","method":"PATCH","path":"/v2/REPO2/blobs/uploads/8dc56fe9-0ff0-47eb-8de5-d08b3e11fdf5","statusCode":202,"latency":"10s","bodySize":0,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["15780594"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["RENNER2"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["RENNER2"],"X-Zot-Cluster-Hop-Count":["1"]},"goroutine":51764,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:21:14.192904896Z","message":"HTTP API"}
{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":GITLABUSER,"component":"session","clientIP":"10.12.18.53:56870","method":"PATCH","path":"/v2/REPO2/blobs/uploads/1b501c67-4f7c-49d8-8739-1fa7da3a26a9","statusCode":202,"latency":"23s","bodySize":0,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["9987"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["RENNER2"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["RENNER2"],"X-Zot-Cluster-Hop-Count":["1"]},"goroutine":55413,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:22:25.178651536Z","message":"HTTP API"}
{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":GITLABUSER,"component":"session","clientIP":"10.12.18.51:28692","method":"PATCH","path":"/v2/REPO3/blobs/uploads/d8fee716-39a0-40ea-b5a8-de5f439bb3f8","statusCode":202,"latency":"3m55s","bodySize":0,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["529667453"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["RENNER2"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["RENNER2"]},"goroutine":64789,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:28:24.953249238Z","message":"HTTP API"}
{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":GITLABUSER,"component":"session","clientIP":"10.12.18.53:30316","method":"PATCH","path":"/v2/REPO2/blobs/uploads/2dc4bffb-30ed-4675-bb82-0499adce7caf","statusCode":202,"latency":"1m2s","bodySize":0,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["15777017"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["RENNER2"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["RENNER2"],"X-Zot-Cluster-Hop-Count":["1"]},"goroutine":81827,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:30:11.855047442Z","message":"HTTP API"}
{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":GITLABUSER,"component":"session","clientIP":"10.12.18.53:60356","method":"PATCH","path":"/v2/REPO2/blobs/uploads/952e5f6e-7332-428b-a373-b7299ff72d57","statusCode":202,"latency":"1m26s","bodySize":0,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["9982"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["RENNER2"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["RENNER2"],"X-Zot-Cluster-Hop-Count":["1"]},"goroutine":94682,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:34:09.787062847Z","message":"HTTP API"}
{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":GITLABUSER,"component":"session","clientIP":"10.12.18.51:48488","method":"PATCH","path":"/v2/REPO3/blobs/uploads/3f13870c-9c23-400b-acf4-a2fd29976d00","statusCode":500,"latency":"5m1s","bodySize":178,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["531260452"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["RENNER2"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["RENNER2"]},"goroutine":96520,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:38:13.894181382Z","message":"HTTP API"}

note: this is log from 10.12.18.52:8080 cluster member, just -B 10
note: there are two more cluster members 10.12.18.51 and 10.12.18.53

@shcherbak
Copy link

@shcherbak, can you maybe also use the -B option for grep? We're interested in potential errors showing before the 500 code is returned.

Note I fixed some of the issues with visualizing the docker images in the UI, and merged that on main before rebasing this PR.

now we are on v2.1.3-rc3 tag, much better

@andaaron
Copy link
Contributor Author

andaaron commented Mar 11, 2025

@shcherbak, can you maybe also use the -B option for grep? We're interested in potential errors showing before the 500 code is returned.
Note I fixed some of the issues with visualizing the docker images in the UI, and merged that on main before rebasing this PR.

{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":GITLABUSER,"component":"session","clientIP":"10.12.18.51:33340","method":"PATCH","path":"/v2/REPO1/blobs/uploads/97b3504a-f280-422f-9664-3dad1667f028","statusCode":202,"latency":"4s","bodySize":0,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["17344948"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["RENNER1"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["RENNER1"],"X-Zot-Cluster-Hop-Count":["1"]},"goroutine":7163125,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:02:45.188078158Z","message":"HTTP API"}
{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":GITLABUSER,"component":"session","clientIP":"10.12.18.51:33356","method":"PATCH","path":"/v2/REPO1/blobs/uploads/fe944474-9225-491b-bd1b-f6e06c90c5b9","statusCode":202,"latency":"5s","bodySize":0,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["20750132"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["RENNER1"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["RENNER1"],"X-Zot-Cluster-Hop-Count":["1"]},"goroutine":7163177,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:02:47.197397561Z","message":"HTTP API"}
{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":GITLABUSER,"component":"session","clientIP":"10.12.18.51:20012","method":"PATCH","path":"/v2/REPO1/blobs/uploads/a5dd54ba-ec48-478b-96d2-7826231043a7","statusCode":202,"latency":"0s","bodySize":0,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["22329"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["RENNER1"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["RENNER1"],"X-Zot-Cluster-Hop-Count":["1"]},"goroutine":7166094,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:03:01.629020563Z","message":"HTTP API"}
{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":GITLABUSER,"component":"session","clientIP":"10.12.18.53:38234","method":"PATCH","path":"/v2/REPO2/blobs/uploads/be723d7d-d6f3-466e-9f25-0e07ec499eff","statusCode":202,"latency":"2s","bodySize":0,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["15777213"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["RENNER2"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["RENNER2"],"X-Zot-Cluster-Hop-Count":["1"]},"goroutine":7170018,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:04:12.453675456Z","message":"HTTP API"}
{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":GITLABUSER,"component":"session","clientIP":"10.12.18.53:23072","method":"PATCH","path":"/v2/REPO2/blobs/uploads/4db044db-5f06-4321-8510-6437cbe42446","statusCode":202,"latency":"0s","bodySize":0,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["9987"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["RENNER2"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["RENNER2"],"X-Zot-Cluster-Hop-Count":["1"]},"goroutine":7170182,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:04:15.91019342Z","message":"HTTP API"}
{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":GITLABUSER,"component":"session","clientIP":"10.12.18.53:37148","method":"PATCH","path":"/v2/REPO2/blobs/uploads/8dc56fe9-0ff0-47eb-8de5-d08b3e11fdf5","statusCode":202,"latency":"10s","bodySize":0,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["15780594"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["RENNER2"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["RENNER2"],"X-Zot-Cluster-Hop-Count":["1"]},"goroutine":51764,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:21:14.192904896Z","message":"HTTP API"}
{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":GITLABUSER,"component":"session","clientIP":"10.12.18.53:56870","method":"PATCH","path":"/v2/REPO2/blobs/uploads/1b501c67-4f7c-49d8-8739-1fa7da3a26a9","statusCode":202,"latency":"23s","bodySize":0,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["9987"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["RENNER2"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["RENNER2"],"X-Zot-Cluster-Hop-Count":["1"]},"goroutine":55413,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:22:25.178651536Z","message":"HTTP API"}
{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":GITLABUSER,"component":"session","clientIP":"10.12.18.51:28692","method":"PATCH","path":"/v2/REPO3/blobs/uploads/d8fee716-39a0-40ea-b5a8-de5f439bb3f8","statusCode":202,"latency":"3m55s","bodySize":0,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["529667453"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["RENNER2"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["RENNER2"]},"goroutine":64789,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:28:24.953249238Z","message":"HTTP API"}
{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":GITLABUSER,"component":"session","clientIP":"10.12.18.53:30316","method":"PATCH","path":"/v2/REPO2/blobs/uploads/2dc4bffb-30ed-4675-bb82-0499adce7caf","statusCode":202,"latency":"1m2s","bodySize":0,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["15777017"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["RENNER2"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["RENNER2"],"X-Zot-Cluster-Hop-Count":["1"]},"goroutine":81827,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:30:11.855047442Z","message":"HTTP API"}
{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":GITLABUSER,"component":"session","clientIP":"10.12.18.53:60356","method":"PATCH","path":"/v2/REPO2/blobs/uploads/952e5f6e-7332-428b-a373-b7299ff72d57","statusCode":202,"latency":"1m26s","bodySize":0,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["9982"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["RENNER2"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["RENNER2"],"X-Zot-Cluster-Hop-Count":["1"]},"goroutine":94682,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:34:09.787062847Z","message":"HTTP API"}
{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":GITLABUSER,"component":"session","clientIP":"10.12.18.51:48488","method":"PATCH","path":"/v2/REPO3/blobs/uploads/3f13870c-9c23-400b-acf4-a2fd29976d00","statusCode":500,"latency":"5m1s","bodySize":178,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["531260452"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["RENNER2"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["RENNER2"]},"goroutine":96520,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:38:13.894181382Z","message":"HTTP API"}

note: this is log from 10.12.18.52:8080 cluster member, just -B 10 note: there are two more cluster members 10.12.18.51 and 10.12.18.53

When I mentioned -B I meant all log messages not just HTTP PATCH log messages.
Maybe filter by goroutine 96520, which produced the error. Or all messages starting 10 seconds before it (in case of concurrency issues).

@rchincha
Copy link
Contributor

rchincha commented Mar 12, 2025

@andaaron can you please rebase this PR?
Will run some tests independently.

@andaaron
Copy link
Contributor Author

@andaaron can you please rebase this PR? Will run some tests independently.

It is rebased.

@shcherbak
Copy link

@shcherbak, can you maybe also use the -B option for grep? We're interested in potential errors showing before the 500 code is returned.
Note I fixed some of the issues with visualizing the docker images in the UI, and merged that on main before rebasing this PR.

{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":GITLABUSER,"component":"session","clientIP":"10.12.18.51:33340","method":"PATCH","path":"/v2/REPO1/blobs/uploads/97b3504a-f280-422f-9664-3dad1667f028","statusCode":202,"latency":"4s","bodySize":0,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["17344948"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["RENNER1"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["RENNER1"],"X-Zot-Cluster-Hop-Count":["1"]},"goroutine":7163125,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:02:45.188078158Z","message":"HTTP API"}
{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":GITLABUSER,"component":"session","clientIP":"10.12.18.51:33356","method":"PATCH","path":"/v2/REPO1/blobs/uploads/fe944474-9225-491b-bd1b-f6e06c90c5b9","statusCode":202,"latency":"5s","bodySize":0,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["20750132"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["RENNER1"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["RENNER1"],"X-Zot-Cluster-Hop-Count":["1"]},"goroutine":7163177,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:02:47.197397561Z","message":"HTTP API"}
{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":GITLABUSER,"component":"session","clientIP":"10.12.18.51:20012","method":"PATCH","path":"/v2/REPO1/blobs/uploads/a5dd54ba-ec48-478b-96d2-7826231043a7","statusCode":202,"latency":"0s","bodySize":0,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["22329"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["RENNER1"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["RENNER1"],"X-Zot-Cluster-Hop-Count":["1"]},"goroutine":7166094,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:03:01.629020563Z","message":"HTTP API"}
{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":GITLABUSER,"component":"session","clientIP":"10.12.18.53:38234","method":"PATCH","path":"/v2/REPO2/blobs/uploads/be723d7d-d6f3-466e-9f25-0e07ec499eff","statusCode":202,"latency":"2s","bodySize":0,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["15777213"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["RENNER2"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["RENNER2"],"X-Zot-Cluster-Hop-Count":["1"]},"goroutine":7170018,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:04:12.453675456Z","message":"HTTP API"}
{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":GITLABUSER,"component":"session","clientIP":"10.12.18.53:23072","method":"PATCH","path":"/v2/REPO2/blobs/uploads/4db044db-5f06-4321-8510-6437cbe42446","statusCode":202,"latency":"0s","bodySize":0,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["9987"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["RENNER2"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["RENNER2"],"X-Zot-Cluster-Hop-Count":["1"]},"goroutine":7170182,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:04:15.91019342Z","message":"HTTP API"}
{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":GITLABUSER,"component":"session","clientIP":"10.12.18.53:37148","method":"PATCH","path":"/v2/REPO2/blobs/uploads/8dc56fe9-0ff0-47eb-8de5-d08b3e11fdf5","statusCode":202,"latency":"10s","bodySize":0,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["15780594"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["RENNER2"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["RENNER2"],"X-Zot-Cluster-Hop-Count":["1"]},"goroutine":51764,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:21:14.192904896Z","message":"HTTP API"}
{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":GITLABUSER,"component":"session","clientIP":"10.12.18.53:56870","method":"PATCH","path":"/v2/REPO2/blobs/uploads/1b501c67-4f7c-49d8-8739-1fa7da3a26a9","statusCode":202,"latency":"23s","bodySize":0,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["9987"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["RENNER2"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["RENNER2"],"X-Zot-Cluster-Hop-Count":["1"]},"goroutine":55413,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:22:25.178651536Z","message":"HTTP API"}
{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":GITLABUSER,"component":"session","clientIP":"10.12.18.51:28692","method":"PATCH","path":"/v2/REPO3/blobs/uploads/d8fee716-39a0-40ea-b5a8-de5f439bb3f8","statusCode":202,"latency":"3m55s","bodySize":0,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["529667453"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["RENNER2"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["RENNER2"]},"goroutine":64789,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:28:24.953249238Z","message":"HTTP API"}
{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":GITLABUSER,"component":"session","clientIP":"10.12.18.53:30316","method":"PATCH","path":"/v2/REPO2/blobs/uploads/2dc4bffb-30ed-4675-bb82-0499adce7caf","statusCode":202,"latency":"1m2s","bodySize":0,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["15777017"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["RENNER2"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["RENNER2"],"X-Zot-Cluster-Hop-Count":["1"]},"goroutine":81827,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:30:11.855047442Z","message":"HTTP API"}
{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":GITLABUSER,"component":"session","clientIP":"10.12.18.53:60356","method":"PATCH","path":"/v2/REPO2/blobs/uploads/952e5f6e-7332-428b-a373-b7299ff72d57","statusCode":202,"latency":"1m26s","bodySize":0,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["9982"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["RENNER2"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["RENNER2"],"X-Zot-Cluster-Hop-Count":["1"]},"goroutine":94682,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:34:09.787062847Z","message":"HTTP API"}
{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":GITLABUSER,"component":"session","clientIP":"10.12.18.51:48488","method":"PATCH","path":"/v2/REPO3/blobs/uploads/3f13870c-9c23-400b-acf4-a2fd29976d00","statusCode":500,"latency":"5m1s","bodySize":178,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["531260452"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["RENNER2"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["RENNER2"]},"goroutine":96520,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:38:13.894181382Z","message":"HTTP API"}

note: this is log from 10.12.18.52:8080 cluster member, just -B 10 note: there are two more cluster members 10.12.18.51 and 10.12.18.53

When I mentioned -B I meant all log messages not just HTTP PATCH log messages. Maybe filter by goroutine 96520, which produced the error. Or all messages starting 10 seconds before it (in case of concurrency issues).

/var/log/zot/zot.log.2:{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","identity":"GITLABUSER","goroutine":96520,"caller":"zotregistry.dev/zot/pkg/api/authn.go:143","time":"2025-03-06T12:33:12.645163763Z","message":"user profile successfully set"}
/var/log/zot/zot.log.2:{"level":"debug","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","repository":"REPO1","goroutine":96520,"caller":"zotregistry.dev/zot/pkg/api/proxy.go:50","time":"2025-03-06T12:33:12.645227683Z","message":"target member socket: 10.12.18.51:8080 index: 0"}
/var/log/zot/zot.log.2:{"level":"debug","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","repository":"REPO1","goroutine":96520,"caller":"zotregistry.dev/zot/pkg/api/proxy.go:69","time":"2025-03-06T12:33:12.645255791Z","message":"proxying the request"}
/var/log/zot/zot.log.2:{"level":"error","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","error":"Patch \"http://10.12.18.51:8080/v2/REPO1/blobs/uploads/3f13870c-9c23-400b-acf4-a2fd29976d00\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)","repository":"REPO1","goroutine":96520,"caller":"zotregistry.dev/zot/pkg/api/proxy.go:73","time":"2025-03-06T12:38:13.894055962Z","message":"failed to proxy the request"}
/var/log/zot/zot.log.2:{"level":"info","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","module":"http","username":"GITLABUSER","component":"session","clientIP":"10.12.18.51:48488","method":"PATCH","path":"/v2/REPO1/blobs/uploads/3f13870c-9c23-400b-acf4-a2fd29976d00","statusCode":500,"latency":"5m1s","bodySize":178,"headers":{"Accept-Encoding":["gzip"],"Authorization":["******"],"Content-Length":["531260452"],"User-Agent":["docker/24.0.5 go/go1.20.6 git-commit/a61e2b4 kernel/5.8.0-0.bpo.2-amd64 os/linux arch/amd64 UpstreamClient(Docker-Client/27.5.0 \\(linux\\))"],"X-Forwarded-For":["REDACTED"],"X-Forwarded-Proto":["https"],"X-Real-Ip":["REDACTED"]},"goroutine":96520,"caller":"zotregistry.dev/zot/pkg/api/session.go:137","time":"2025-03-06T12:38:13.894181382Z","message":"HTTP API"}

complete log of the routine 96520

@andaaron
Copy link
Contributor Author

andaaron commented Mar 12, 2025

This is interesting
/var/log/zot/zot.log.2:{"level":"error","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","error":"Patch \"http://10.12.18.51:8080/v2/REPO1/blobs/uploads/3f13870c-9c23-400b-acf4-a2fd29976d00\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)","repository":"REPO1","goroutine":96520,"caller":"zotregistry.dev/zot/pkg/api/proxy.go:73","time":"2025-03-06T12:38:13.894055962Z","message":"failed to proxy the request"}

What was happening at 10.12.18.51:8080 around this time? Why was it not responding?

@rchincha
Copy link
Contributor

In zb,

cmd/zb/helper.go
67:func deleteTestRepo(repos []string, url string, client *resty.Client) error {

^ deletes do happen, we just don't track/report them

@shcherbak
Copy link

2025-03-06T12:38:

node was online, but i don't have logs for this period because of rotation

@andaaron
Copy link
Contributor Author

andaaron commented Mar 16, 2025

This is interesting /var/log/zot/zot.log.2:{"level":"error","clusterMember":"10.12.18.52:8080","clusterMemberIndex":"1","error":"Patch \"http://10.12.18.51:8080/v2/REPO1/blobs/uploads/3f13870c-9c23-400b-acf4-a2fd29976d00\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)","repository":"REPO1","goroutine":96520,"caller":"zotregistry.dev/zot/pkg/api/proxy.go:73","time":"2025-03-06T12:38:13.894055962Z","message":"failed to proxy the request"}

What was happening at 10.12.18.51:8080 around this time? Why was it not responding?

I managed to reproduce this error using https://github.com/project-zot/zot/pull/3028/files#diff-76ed074a9305c04054cdebb9e9aad2d818052b07091de1f20cad0bbac34ffb52R510
I reproduced it without these storage changes.

We still need to figure out the root cause...

@andaaron
Copy link
Contributor Author

@shcherbak do you think it is feasible to remove the cluster configuration from your environment and try this PR again?
Theoretically, we don't need cluster if Redis and S3 are used together.

@shcherbak
Copy link

@shcherbak do you think it is feasible to remove the cluster configuration from your environment and try this PR again? Theoretically, we don't need cluster if Redis and S3 are used together.

and load balancing is on nginx side? ok, I will try it, but I need time for preparation because I rolled back from zot to distribution due this issue. I have cold zot setup and will be able to test in a few days

@andaaron
Copy link
Contributor Author

yes, nginx-side load balancing.

@rchincha
Copy link
Contributor

rchincha commented Mar 21, 2025

@andaaron Maybe we do the following ...

  1. If dedupe is disabled in configuration, then we use your method, else
  2. Stick to the global lock

Revert the dedupe side digest-based locks
And in the WithLock() methods, just check whether config.dedupe==true

Folks who want higher performance have to give up dedupe (which requires locking multiple repos potentially until we figure out a solution), others who want dedupe, give up perf.

Thoughts?

@shcherbak
Copy link

@shcherbak do you think it is feasible to remove the cluster configuration from your environment and try this PR again? Theoretically, we don't need cluster if Redis and S3 are used together.

and load balancing is on nginx side? ok, I will try it, but I need time for preparation because I rolled back from zot to distribution due this issue. I have cold zot setup and will be able to test in a few days

My allergies for delay, preparing tests right now

@andaaron
Copy link
Contributor Author

@andaaron Maybe we do the following ...

  1. If dedupe is disabled in configuration, then we use your method, else
  2. Stick to the global lock

Revert the dedupe side digest-based locks And in the WithLock() methods, just check whether config.dedupe==true

Folks who want higher performance have to give up dedupe (which requires locking multiple repos potentially until we figure out a solution), others who want dedupe, give up perf.

Thoughts?

I think in most of the cases people want dedupe. Storage usage would grow out of control without it (specifically in cases where there's multiple images inheriting layers from one another).

@shcherbak
Copy link

shcherbak commented Mar 22, 2025

@shcherbak do you think it is feasible to remove the cluster configuration from your environment and try this PR again? Theoretically, we don't need cluster if Redis and S3 are used together.

and load balancing is on nginx side? ok, I will try it, but I need time for preparation because I rolled back from zot to distribution due this issue. I have cold zot setup and will be able to test in a few days

My allergies for delay, preparing tests right now

i'v prepared separate environment and configs for reproduce the load. So can repeat at any time.
But the first try has no luck

build command is:

$ docker buildx use gitlab-build-toolkit-instance
$ docker buildx build --provenance false \
  --cache-to=type=registry,image-manifest=true,oci-mediatypes=true,ref=REGISTRY-REDACTED/IMAGE-REDACTED:buildcache \
  --cache-from=type=registry,ref=REGISTRY-REDACTED/IMAGE-REDACTED:buildcache \
  --label=org.opencontainers.image.title="$CI_PROJECT_TITLE" \
  --label=org.opencontainers.image.description="$CI_PROJECT_DESCRIPTION" \
  --label=org.opencontainers.image.source="$CI_PROJECT_URL"  \
  --push --target prod \
  -t REGISTRY-REDACTED/IMAGE-REDACTED:$CI_COMMIT_SHA -f docker/php/Dockerfile .

зображення

зображення

зображення

зображення

зображення

no 500 responses, but I can not push even one image to reproduce production load:
зображення

why those dedupe tasks appears even when dedupe disabled in config file?

updt:
зображення
workers are really busy with one image

…t locking

- lock per repo on pushes/pulls/retention, in short index operations
- lock per digest when using multiple operations affecting the cachedb and storage
(blob writes/deletes/moves/links in storage which need to be in accordance with cachedb content)

Do not lock multiple repos at the same time in the same goroutine! It will cause deadlocks.
Same applies to digests.

Signed-off-by: Andrei Aaron <[email protected]>
…+ s3 storage"

This reverts commit 88ad384.

Signed-off-by: Andrei Aaron <[email protected]>
(cherry picked from commit 5717118)
Signed-off-by: Andrei Aaron <[email protected]>
(cherry picked from commit 2eec0ba)
Signed-off-by: Andrei Aaron <[email protected]>
@andaaron
Copy link
Contributor Author

The dedupe tasks are actually "un-dedupe" and are there to ensure the blobs are no longer deduped in case at some point in the past you had dedupe turned on (and there's no way to know if you had dedupe on and you turned it off or you never had it on to begin with).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants