Invalidate host creds in case of failure and retry#1925
Conversation
Unit tests: - TestRegistryHostsCachesAndReturnsHosts: verifies cache returns same hosts - TestInvalidateRegistryHostsForcesFreshHosts: verifies invalidation creates fresh AuthClient (fails when InvalidateRegistryHosts is a no-op) - TestInvalidateIsNoOpForUnknownRef: no panic on unknown ref Integration test: - TestCredentialRefreshOnCheckFailure: sets up Bearer token auth registry with a custom token server, pulls image via SOCI with check_always=true, rotates the token server password, and verifies the container can still run after credential rotation.
Use 3-second token TTL so cached Bearer tokens expire during the test. After expiry, the docker.Authorizer's cached handler tries to re-fetch a token with old credentials baked into its TokenOptions. The token server rejects the old password, causing doBearerAuth to fail without calling AddResponses — so the handler is never deleted and the Authorizer is stuck. Without InvalidateRegistryHosts: layer "N" unavailable (permanent failure). With InvalidateRegistryHosts: cache cleared, fresh Authorizer created, new handler fetches token with new password, container runs.
407a251 to
aeb4d65
Compare
|
@sondavidb I (+AI) was able to replicate the
This in no way is ready for review. I just want your views on the FS cache invalidation logic :) (I will refine the PR once I have your views aligned, tho integration test is not really viable in this case without a lot of custom changes like in this PR) |
|
Idea LGTM, I'm happy to move forward with this. I do have some concerns in terms of credentials on a per-layer basis, as I believe we cache the credentials per layer as well. But I don't think this has to be comprehensive by any means — if we can refresh credentials on a snapshotter level that is already helping a ton of pain points for Kubernetes use cases. |
|
Thanks, I will go ahead and refine the PR (by Monday, afk for some days) 😄 Note: Integration test is gonna be very complex with the current testing setup thus would not have any for now. I'll research a bit in future if theres any easier way of implementing a integration test. |
Issue #, if available:
Related to #1924
Disclaimer - Coded by AI
Description of changes:
Testing performed:
Proof:
The mechanism: Short-lived Bearer tokens expire → docker.Authorizer's cached handler tries fetchToken with old password baked in TokenOptions → token server rejects → error inside doBearerAuth (before AddResponses) → handler never deleted → Authorizer stuck → our fix invalidates the registryHostMap → fresh Authorizer with empty handlers → re-authenticates with new password from docker config → works.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.