Skip to content

Conversation

@vrajashkr
Copy link
Contributor

What type of PR is this?

feature

Which issue does this PR fix:
New Feature - Streaming Sync
Base design in #3733

What does this PR do / Why do we need it:

  • Implements streaming sync where clients need not wait for a full image to be downloaded to zot before the client can start downloading data. The blobs are streamed simultaneously as the blobs are being downloaded by zot from upstream.

Testing done on this change:
WIP

Automation added to e2e:
WIP

Will this break upgrades or downgrades?
No - feature usage is optional and only takes effect when configured.

Does this PR introduce any user-facing change?:
WIP


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@vrajashkr
Copy link
Contributor Author

regclient PR - regclient/regclient#1046

@vrajashkr
Copy link
Contributor Author

This is currently in very early stages of development. It does currently work though - image blobs get streamed while zot is still downloading things.

There is also minimal modification to existing sync code at the moment. It fits in quite well. There are still a bunch of cases to be handled that may change this.

@vrajashkr
Copy link
Contributor Author

Some early experiments:

Without streaming - maven:4.0.0-rc-5 image (178MB):
time skopeo copy --src-tls-verify=false docker://localhost:8080/maven:4.0.0-rc-5 oci:/tmp/maven
Getting image source signatures
Copying blob 51ddbc2a7b8e done   | 
Copying blob a2bc6c481c69 done   | 
Copying blob c9edb77ff233 done   | 
Copying blob a3629ac5b9f4 done   | 
Copying blob 4f4fb700ef54 done   | 
Copying blob a52ef49a9a3c done   | 
Copying blob 6f05633d7590 done   | 
Copying blob 04077b222bff done   | 
Copying blob 4f4fb700ef54 skipped: already exists  
Copying blob eb6aac0d0581 done   | 
Copying config 47642f8e87 done   | 
Writing manifest to image destination

real    0m1.107s
user    0m0.239s
sys     0m0.541s

Without streaming - pytorch/pytorch:2.10.0-cuda13.0-cudnn9-runtime image (2.89GB):
time skopeo copy --src-tls-verify=false docker://localhost:8080/pytorch/pytorch:2.10.0-cuda13.0-cudnn9-runtime oci:/tmp/pytorch
Getting image source signatures
Copying blob 17660a4fee57 done   | 
Copying blob 2210b9d44fce done   | 
Copying blob 62ed6ab5ceea done   | 
Copying blob e93fce65fb9f done   | 
Copying blob f9fe3341e909 done   | 
Copying blob a35293800f14 done   | 
Copying config f6a24cb681 done   | 
Writing manifest to image destination

real    0m21.138s
user    0m3.805s
sys     0m7.583s
With streaming - maven:4.0.0-rc-5 image (178MB) - chunkSize 500bytes :
time skopeo copy --src-tls-verify=false docker://localhost:8080/maven:4.0.0-rc-5 oci:/tmp/maven
Getting image source signatures
Copying blob 51ddbc2a7b8e done   | 
Copying blob a3629ac5b9f4 done   | 
Copying blob a2bc6c481c69 done   | 
Copying blob 4f4fb700ef54 done   | 
Copying blob a52ef49a9a3c done   | 
Copying blob c9edb77ff233 done   | 
Copying blob 6f05633d7590 done   | 
Copying blob 04077b222bff done   | 
Copying blob 4f4fb700ef54 skipped: already exists  
Copying blob eb6aac0d0581 done   | 
Copying config 47642f8e87 done   | 
Writing manifest to image destination

real    0m3.706s
user    0m0.709s
sys     0m1.912s

With streaming - pytorch/pytorch:2.10.0-cuda13.0-cudnn9-runtime image (2.89GB) - chunkSize 500bytes:

time skopeo copy --src-tls-verify=false docker://localhost:8080/pytorch/pytorch:2.10.0-cuda13.0-cudnn9-runtime oci:/tmp/pytorch
Getting image source signatures
Copying blob e93fce65fb9f done   | 
Copying blob 2210b9d44fce done   | 
Copying blob f9fe3341e909 done   | 
Copying blob a35293800f14 done   | 
Copying blob 62ed6ab5ceea done   | 
Copying blob 17660a4fee57 done   | 
Copying config f6a24cb681 done   | 
Writing manifest to image destination

real    1m51.133s
user    0m15.491s
sys     0m40.853s
With streaming - maven:4.0.0-rc-5 image (178MB) - chunkSize 32768 bytes :
time skopeo copy --src-tls-verify=false docker://localhost:8080/maven:4.0.0-rc-5 oci:/tmp/maven
Getting image source signatures
Copying blob 4f4fb700ef54 done   | 
Copying blob a52ef49a9a3c done   | 
Copying blob c9edb77ff233 done   | 
Copying blob 51ddbc2a7b8e done   | 
Copying blob a3629ac5b9f4 done   | 
Copying blob a2bc6c481c69 done   | 
Copying blob 6f05633d7590 done   | 
Copying blob 04077b222bff done   | 
Copying blob 4f4fb700ef54 skipped: already exists  
Copying blob eb6aac0d0581 done   | 
Copying config 47642f8e87 done   | 
Writing manifest to image destination

real    0m0.820s
user    0m0.314s
sys     0m0.635s

With streaming - pytorch/pytorch:2.10.0-cuda13.0-cudnn9-runtime image (2.89GB) - chunkSize 32768 bytes:
time skopeo copy --src-tls-verify=false docker://localhost:8080/pytorch/pytorch:2.10.0-cuda13.0-cudnn9-runtime oci:/tmp/pytorch
Getting image source signatures
Copying blob 2210b9d44fce done   | 
Copying blob 62ed6ab5ceea done   | 
Copying blob e93fce65fb9f done   | 
Copying blob a35293800f14 done   | 
Copying blob 17660a4fee57 done   | 
Copying blob f9fe3341e909 done   | 
Copying config f6a24cb681 done   | 
Writing manifest to image destination

real    0m19.084s
user    0m5.746s
sys     0m12.176s

sm.streamLock.Lock()
defer sm.streamLock.Unlock()

// TODO: this can result in a race condition if the ImageCopy with Options hasn't triggered the hook yet
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this isn't a problem anymore - the manifest fetch adds blob readers to the map before returning so that the readers are all present before a client could ever request the corresponding blobs.

godigest "github.com/opencontainers/go-digest"
)

type StreamTempStore interface {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created this just for test purposes. We could find a smart way to do away with it.

Comment on lines +65 to +82
var internalBuffBytes []byte = make([]byte, 0, cbr.chunkSizeBytes)
internalBuff := bytes.NewBuffer(internalBuffBytes)

multiWriter := io.MultiWriter(cbr.onDiskFile, internalBuff)

numBytesRead, err := io.CopyN(multiWriter, cbr.InFlightReader, cbr.chunkSizeBytes)
if err != nil {
if !errors.Is(err, io.EOF) {
cbr.logger.Error().Err(err).Msg("failed to copy from in flight reader")
// TODO: This means there was an upstream read error. Should the in-progress streams be terminated?
copy(buff, internalBuff.Bytes())
cbr.chunksMu.Unlock()

return int(numBytesRead), err
}
}

copy(buff, internalBuff.Bytes())
Copy link
Contributor Author

@vrajashkr vrajashkr Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is not very efficient. Will work on improving this. It is currently only marginally faster in some cases due to the overheads brought in by the chunk Size.

The speed up gained in streaming is lost again due to low download speed. Need to find a way to get to the same raw download speeds as if the image were already on disk.

Comment on lines +356 to +376
// imager, ok := orig.(manifest.Imager)
// if !ok {
// return nil, errors.New("failed to convert to imager")
// }

// next, for config
// cfg, err := imager.GetConfig()
// if err != nil {
// return nil, err
// }

err = service.streamManager.PrepareActiveStreamForBlob(contents.Config.Digest)
if err != nil {
return nil, err
}

// finally, for all layers
// layers, err := imager.GetLayers()
// if err != nil {
// return nil, err
// }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to read things from the manifest directly, but I think I'm using the Imager interface wrong. I Unmarshalled the manifest body instead. I'll take a look at this again later. For now, it works just fine.

routeHandler.c.Log.Info().Str("repository", name).Str("reference", reference).
Msg("streaming is enabled. Direct fetching manifest.")

fetchedManifest, err := routeHandler.c.SyncOnDemand.FetchManifest(ctx, name, reference)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, every client request will cause a fresh manifest fetch to upstream which is not good. Need to implement something to cache the manifest as well while blobs are syncing.

@codecov
Copy link

codecov bot commented Feb 8, 2026

Codecov Report

❌ Patch coverage is 3.62319% with 266 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.74%. Comparing base (67b8241) to head (a69c1fd).

Files with missing lines Patch % Lines
pkg/extensions/sync/chunked_blob_reader.go 0.00% 66 Missing ⚠️
pkg/extensions/sync/stream_manager.go 0.00% 50 Missing ⚠️
pkg/extensions/sync/service.go 2.32% 41 Missing and 1 partial ⚠️
pkg/api/routes.go 5.40% 33 Missing and 2 partials ⚠️
pkg/extensions/sync/inflight_blob_copier.go 0.00% 35 Missing ⚠️
pkg/extensions/sync/stream_temp_store.go 0.00% 19 Missing ⚠️
pkg/extensions/sync/on_demand.go 0.00% 13 Missing ⚠️
pkg/api/controller.go 33.33% 3 Missing and 1 partial ⚠️
pkg/extensions/sync/on_demand_disabled.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3778      +/-   ##
==========================================
- Coverage   91.63%   90.74%   -0.89%     
==========================================
  Files         190      194       +4     
  Lines       27059    27333     +274     
==========================================
+ Hits        24795    24803       +8     
- Misses       1463     1725     +262     
- Partials      801      805       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

"extensions": {
"sync": {
"enable": true,
"enableStreaming": true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stream: true

"sync": {
"enable": true,
"enableStreaming": true,
"streamChunkSizeBytes": 32768,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed? how does it affect client-side request handling?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants