Honor registry mirrors when fetching SOCI artifacts by milosgajdos · Pull Request #1844 · awslabs/soci-snapshotter

milosgajdos · 2026-01-14T13:56:35Z

Issue #, if available:
Fixes #1013

Description of changes:
SOCI artifact discovery/fetch (index + zTOCs) was constructing an ORAS repository from the original image locator (e.g. docker.io/...), causing the snapshotter to ignore configured registry mirrors and talk to the origin registry for SOCI artifacts.

This changes the SOCI index fetch path to use the resolved docker.RegistryHost (mirror host + client) when constructing the ORAS repository locator. This makes SOCI artifact requests follow the same mirror configuration used for layer pulls.

Testing performed:
✅

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

sondavidb

Hey, thanks for the contribution and sorry for leaving this hanging for so long. I left a comment to make sure the feature is working. Would also love an integration test as well if possible. After changes are made happy to run them through CI.

SOCI artifact discovery/fetch (index + zTOCs) was constructing an ORAS repository from the original image locator (e.g. docker.io/...), causing the snapshotter to ignore configured registry mirrors and talk to the origin registry for SOCI artifacts. This changes the SOCI index fetch path to use the resolved docker.RegistryHost (mirror host + client) when constructing the ORAS repository locator. This makes SOCI artifact requests follow the same mirror configuration used for layer pulls. Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>

milosgajdos · 2026-03-06T17:10:16Z

Hey! Sorry, I was away on hols, but will be picking this up today. Thanks for your patience

Follow-up to c751c6d to fully address mirror behavior for SOCI artifact discovery/fetch (index + zTOCs). The original fix moved SOCI artifact lookups away from the image’s origin locator, but it still effectively depended on the first resolved host and didn’t fully preserve per-host details. In practice that meant we could still fail in mirror-heavy setups where the first endpoint is unavailable or where the mirror is configured with HTTP/path differences. This change finishes that work so SOCI index and zTOC fetches behave like the rest of our mirror-aware pull flow. We now carry the full RegistryHost context through the fetch path, rewrite locators with host path awareness, honor mirror scheme when constructing ORAS repositories, and retry across all configured hosts instead of stopping at the first one. I also tightened error handling around missing host clients and added tests that cover the new behavior directly: locator rewriting edge cases, host scheme handling, and fallback from one mirror endpoint to another (including the all-hosts-fail path). The goal is to make artifact discovery/fetch resilient and predictable in the same environments where layer pulls already are. Also adds a simple integration test. Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>

milosgajdos · 2026-03-06T23:08:20Z

Would also love an integration test as well if possible.

Added some integration tests - not sure if that's what you are after but figured it could be something to at least start a conversation

sondavidb · 2026-03-06T23:17:07Z

Ah, I was looking more at testing in the integration/ folder. I think something along the lines of mirroring the image to a local registry, having the SOCI config primarily pull from some bogus source, and then have the mirror point to the local registry. Is that a sensical workflow?

milosgajdos · 2026-03-07T00:10:58Z

Oh I see! I misunderstood 😬 yeah I can look into that; it seems sensible yeah

Make TestMirror explicitly exercise fallback by configuring two mirrors for the same origin host: a dead mirror first, then the working local mirror. This verifies we recover from a bad mirror endpoint and still pull via SOCI. Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>

milosgajdos · 2026-03-09T19:14:12Z

@sondavidb I smacked in an integration test case in 76fff9b

My thinking was adding a case where we test a fallback to good mirror when trying a dead mirror (bogus) first. Basically that should verify that we recover from a bad mirror endpoint and still pull via SOCI. LMK if this is along the lines you're thinking too.

sondavidb · 2026-03-10T23:58:29Z

Yeah that should be fine by me. I'll look closer at it tomorrow but will run our CI for now. Thanks!

Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>

milosgajdos · 2026-03-11T15:30:51Z

I've added some additional build instructions to docs and some improvements for [the speed of] CI.

Co-authored-by: David Son <55555210+sondavidb@users.noreply.github.com>

sondavidb · 2026-03-12T18:55:53Z

Just FYI for whenever this gets merged, we do rebase and merge so you will have to squash the commits accordingly and have each commit DCO signed. Doesn't have to be now of course, just a heads up 🙂

Retry both MountParallel and MountLocal across the full configured registry host list instead of only using the first host entry. We build per-host refspecs up front so each attempt uses the correct mirror host and path prefix. For MountLocal, we keep using the real content store during retries so content fetched from a fallback host can still be reused locally during apply, and do not delete cached content on local apply failures. Clear partial contents from the prepared mountpoint between local retry attempts while preserving the mountpoint itself and its metadata. This allows later host attempts and containerd fallback to reuse the same upperdir safely. We added regression tests for local host fallback, cached blob preservation, and mountpoint cleanup. Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>

milosgajdos · 2026-03-13T21:02:38Z

CI fail is on the accepted commit suggestion 😬

sondavidb · 2026-03-15T08:14:40Z

CI fail just looks like DCO. Unless I'm misunderstanding, which, apologies if so

milosgajdos · 2026-03-16T14:37:03Z

CI fail just looks like DCO. Unless I'm misunderstanding, which, apologies if so

yeah, GH [platform] fails to reconcile commit signatures when you accept suggestions on a fork; that failing DCO is my accepting your Makefile suggestion - itll go away after squash

milosgajdos · 2026-03-17T22:27:48Z

@sondavidb anything else you want me to address here?

sondavidb · 2026-03-17T22:33:01Z

Nope! Just need to take a look a little closer at the changes before approving. But at a glance they look good.

Thanks again!

sondavidb

LGTM, just a small thing and squash and I'll approve it. I'll also see if I can get someone else on my team to take a look.

Awesome work on this, we've been wanting this for a while :)

sondavidb · 2026-03-17T23:49:23Z

+			}
+		}
+
+		err := fs.rebase(ctx, desc.Digest, imageDigest, mountpoint)


Rebasing can probably be put outside of this loop as it's a fully offline-operation so we don't need to try it more than once on a bad registry.

So, I'm wondering... shouldnt rebase actually remain in the loop because it is where the current host’s async premount failure becomes visible...preloadAllLayers only starts premount work. The actual host-specific failure can surface later when rebase waits on the layer job...wouldnt we lose the retry if we moved it outside the loop? 🤔

the rename/stat portion of rebase is offline, but rebase currently also waits for the host-specific premount job via layerJob.errCh https://github.com/milosgajdos/soci-snapshotter/blob/9315f827d274b45862a1854c6cacd4a78bf5a17f/fs/fs.go#L629

The jobs are keyed via digest and not by ref, and if preloading a layer fails then it will cancel the entire image job and return, so we won't be accidentally getting another registry host's image job.

Though, I wonder now if that's a problem. Should fallbacks for the parallel pull mode be per-layer or per-image? I think per-layer would be more consistent with other snapshotter behaviors. If so, we should maybe move this to the preloadAllLayers and rotate through registry hosts per layer.

But I'm also OK taking this as a followup. We've iterated on this plenty already. (You're more than welcome to change this now if you wish, though.)

milosgajdos · 2026-03-22T23:42:50Z

@sondavidb mind poking some maintainers so we can move forward with this? 🙏

sondavidb · 2026-03-23T19:55:57Z

Sure, sorry for the drag @milosgajdos. I've gone ahead and pinged someone but as he's not around right now I don't expect him to look at this immediately.

henry118

@milosgajdos Thanks for this contribution.

At high level, it occurs to me that the retry granularity here is a bit coarse. I think the only errors that worth retrying the next host are the host I/O related errors. i.e, failing to resolve the host. Other types of errors such as post-processing of downloaded artifacts (decompression/unpacking) should result an immediate failure.

So in other words, I think the logic of iterating the hosts should be implemented at lower level where the actual HTTP requests are initiated.

henry118 · 2026-03-26T18:24:47Z

 		if err != nil {
-			return fmt.Errorf("failed to preload layers for image manifest digest %s: %w", imageDigest, err)
+			hostErr = errors.Join(hostErr, fmt.Errorf("registry host %q: failed to rebase layer %s: %w", hostRef.host.Host, desc.Digest, err))
+			continue


rebase failures should have nothing to do with registry mirrors. i don't think we need to try the next host.

henry118 · 2026-03-26T18:25:44Z

+	if hostErr != nil {
+		return fmt.Errorf("failed to preload layers for image manifest digest %s: %w", imageDigest, hostErr)
+	}
+	return fmt.Errorf("failed to preload layers for image manifest digest %s", imageDigest)
 }


Feel like these can be simplified. esp the last return somehow swallowed the internal errors.

I completely missed this comment; work has kept me busy for a few weeks, but I'll be picking this up today

YQ-Wang · 2026-04-09T09:00:15Z

Interested user here :) I’ve been following this because I’d like to use the feature.

The remaining blocker seems to be retry scope rather than the mirror-awareness behavior itself. The maintainer comments make sense to me: retrying the next host for actual registry-host / network-facing failures seems reasonable, while local post-processing failures like unpack/apply/rebase should fail immediately. It also seems important to preserve the underlying error rather than returning a generic fallback error.

Would a narrow fix along those lines be enough to unblock this PR, with any finer-grained retry redesign handled as follow-up? Happy to test an updated branch or create a PR to this current branch if useful .

Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>

milosgajdos requested a review from a team as a code owner January 14, 2026 13:56

github-actions Bot added go Pull requests that update Go code testing Unit and/or integration tests labels Jan 14, 2026

sondavidb reviewed Feb 9, 2026

View reviewed changes

Comment thread fs/fs.go Outdated

milosgajdos force-pushed the honor-mirrors branch from 76ea63e to 72a8115 Compare March 6, 2026 23:06

milosgajdos requested a review from sondavidb March 6, 2026 23:08

Improve builds and docs

e75996d

Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>

milosgajdos commented Mar 11, 2026

View reviewed changes

Comment thread Dockerfile

sondavidb reviewed Mar 11, 2026

View reviewed changes

Comment thread Dockerfile

Comment thread fs/fetch_soci_index_test.go

Comment thread fs/fs.go Outdated

Comment thread Makefile Outdated

Update Makefile

1ddd75b

Co-authored-by: David Son <55555210+sondavidb@users.noreply.github.com>

milosgajdos requested a review from sondavidb March 12, 2026 23:25

sondavidb reviewed Mar 17, 2026

View reviewed changes

sondavidb mentioned this pull request Mar 17, 2026

Add integration test for running containers after actual soci convert #1897

Merged

sondavidb reviewed Mar 18, 2026

View reviewed changes

Comment thread fs/fs.go

milosgajdos requested a review from sondavidb March 18, 2026 16:56

sondavidb previously approved these changes Mar 18, 2026

View reviewed changes

henry118 reviewed Mar 26, 2026

View reviewed changes

Make soci respect insecure flags in mirrors

dfef19b

Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>

milosgajdos dismissed sondavidb’s stale review via dfef19b April 9, 2026 18:30

Conversation

milosgajdos commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sondavidb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

milosgajdos commented Mar 6, 2026

Uh oh!

milosgajdos commented Mar 6, 2026

Uh oh!

sondavidb commented Mar 6, 2026

Uh oh!

milosgajdos commented Mar 7, 2026

Uh oh!

milosgajdos commented Mar 9, 2026

Uh oh!

sondavidb commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

milosgajdos commented Mar 11, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sondavidb commented Mar 12, 2026

Uh oh!

milosgajdos commented Mar 13, 2026

Uh oh!

sondavidb commented Mar 15, 2026

Uh oh!

milosgajdos commented Mar 16, 2026

Uh oh!

milosgajdos commented Mar 17, 2026

Uh oh!

sondavidb commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sondavidb left a comment

Choose a reason for hiding this comment

Uh oh!

sondavidb Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

milosgajdos Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

milosgajdos Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

sondavidb Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

milosgajdos commented Mar 22, 2026

Uh oh!

sondavidb commented Mar 23, 2026

Uh oh!

henry118 left a comment

Choose a reason for hiding this comment

Uh oh!

henry118 Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

henry118 Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

milosgajdos Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

YQ-Wang commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

milosgajdos commented Jan 14, 2026 •

edited

Loading

sondavidb commented Mar 10, 2026 •

edited

Loading

sondavidb commented Mar 17, 2026 •

edited

Loading

YQ-Wang commented Apr 9, 2026 •

edited

Loading