-
Notifications
You must be signed in to change notification settings - Fork 131
feat(storage): enable parallel writes by using per-repo and per-digest locking #2968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2968 +/- ##
==========================================
- Coverage 90.99% 90.95% -0.05%
==========================================
Files 177 178 +1
Lines 32940 32998 +58
==========================================
+ Hits 29975 30014 +39
- Misses 2238 2253 +15
- Partials 727 731 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
21c56e6
to
a83648b
Compare
b1540db
to
d5b8180
Compare
Looks reasonable. Are there actual benchmark numbers that show improvement under contention? |
6f48c6d
to
cd30d2d
Compare
I have updated the tests using the benchmark HG action to post the comparison in the job summary: And: The results for minio are almost unchanged. |
I ran zb locally with config (note GC and dedupe are enabled):
And zb command: This is the result with the code in #2996 (just some ci improvements) zb_diff.md |
@shcherbak, can you maybe also use the -B option for grep? We're interested in potential errors showing before the 500 code is returned. Note I fixed some of the issues with visualizing the docker images in the UI, and merged that on main before rebasing this PR. |
note: this is log from 10.12.18.52:8080 cluster member, just -B 10 |
now we are on v2.1.3-rc3 tag, much better |
When I mentioned -B I meant all log messages not just HTTP PATCH log messages. |
@andaaron can you please rebase this PR? |
It is rebased. |
complete log of the routine |
This is interesting What was happening at 10.12.18.51:8080 around this time? Why was it not responding? |
In zb, cmd/zb/helper.go ^ deletes do happen, we just don't track/report them |
node was online, but i don't have logs for this period because of rotation |
I managed to reproduce this error using https://github.com/project-zot/zot/pull/3028/files#diff-76ed074a9305c04054cdebb9e9aad2d818052b07091de1f20cad0bbac34ffb52R510 We still need to figure out the root cause... |
@shcherbak do you think it is feasible to remove the |
and load balancing is on nginx side? ok, I will try it, but I need time for preparation because I rolled back from zot to distribution due this issue. I have cold zot setup and will be able to test in a few days |
yes, nginx-side load balancing. |
@andaaron Maybe we do the following ...
Revert the dedupe side digest-based locks Folks who want higher performance have to give up dedupe (which requires locking multiple repos potentially until we figure out a solution), others who want dedupe, give up perf. Thoughts? |
My allergies for delay, preparing tests right now |
I think in most of the cases people want dedupe. Storage usage would grow out of control without it (specifically in cases where there's multiple images inheriting layers from one another). |
i'v prepared separate environment and configs for reproduce the load. So can repeat at any time. build command is:
no 500 responses, but I can not push even one image to reproduce production load: why those dedupe tasks appears even when dedupe disabled in config file? |
…t locking - lock per repo on pushes/pulls/retention, in short index operations - lock per digest when using multiple operations affecting the cachedb and storage (blob writes/deletes/moves/links in storage which need to be in accordance with cachedb content) Do not lock multiple repos at the same time in the same goroutine! It will cause deadlocks. Same applies to digests. Signed-off-by: Andrei Aaron <[email protected]>
…+ s3 storage" This reverts commit 88ad384. Signed-off-by: Andrei Aaron <[email protected]> (cherry picked from commit 5717118)
Signed-off-by: Andrei Aaron <[email protected]> (cherry picked from commit 2eec0ba)
Signed-off-by: Andrei Aaron <[email protected]>
The dedupe tasks are actually "un-dedupe" and are there to ensure the blobs are no longer deduped in case at some point in the past you had dedupe turned on (and there's no way to know if you had dedupe on and you turned it off or you never had it on to begin with). |
Should fix issues such as #2964
(blob writes/deletes/moves/links in storage which need to be in accordance with cachedb content)
Do not lock multiple repos at the same time in the same goroutine! It will cause deadlocks.
Same applies to digests.
In separate commits:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.