Skip to content

Conversation

@TamimWahage
Copy link

What

Adds an optional fallback to crictl pull when ctr images import fails
with content digest sha256:... not found.

Why

When Docker Desktop uses the containerd image store, k3d image import
may export tarballs referencing missing blobs, causing import failures
(see #1538).

This change:

  • preserves existing behavior by default
  • adds another mode via --mode=pull
  • avoids reporting false success

Scope

  • v5.x compatible

Related issues

Fixes / mitigates #1538

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a fallback mechanism to handle image import failures when using Docker Desktop with containerd image store. When ctr images import fails due to missing digest blobs, the system can fall back to using crictl pull to pull images directly into cluster nodes.

Key Changes:

  • Adds a new ImageImportModePull mode that bypasses tar export/import and directly pulls images into nodes
  • Enables automatic fallback to crictl pull in auto-detect mode when tar import fails with missing digest errors
  • Adds improved error collection and reporting for both copy and import operations

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File Description
pkg/types/types.go Adds the ImageImportModePull constant and FallbackPull field to support the new pull-based import mode
pkg/client/tools.go Implements fallback logic to detect missing digest errors and retry with crictl pull, plus enhanced error collection
pkg/client/pull.go New file implementing direct pull-based image import functionality for all cluster nodes concurrently

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ImportModeAutoDetect ImportMode = "auto"
ImportModeDirect ImportMode = "direct"
ImportModeToolsNode ImportMode = "tools-node"
ImageImportModePull ImportMode = "pull"
Copy link

Copilot AI Dec 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constant name ImageImportModePull is inconsistent with the existing naming pattern. The other ImportMode constants follow the pattern ImportMode* (e.g., ImportModeAutoDetect, ImportModeDirect, ImportModeToolsNode), but this one is named ImageImportModePull. Consider renaming it to ImportModePull to maintain consistency with the existing constants in this type.

Copilot uses AI. Check for mistakes.
Comment on lines +164 to +178
if opts.FallbackPull && isMissingDigestErr(err) && len(imagesFromRuntime) > 0 {
l.Log().Warnf("tar import failed in node '%s' due to missing digest; falling back to pull", node.Name)

// pull each runtime image into this node
for _, img := range imagesFromRuntime {
pullCmd := []string{"crictl", "pull", img}
if pullErr := runtime.ExecInNode(ctx, node, pullCmd); pullErr != nil {
errCh <- fmt.Errorf("node %s: import failed (%v) and pull fallback failed for %s: %w",
node.Name, err, img, pullErr)
return
}
}
// fallback succeeded, so treat as success
return
}
Copy link

Copilot AI Dec 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback pull logic returns early without calling wg.Done(), which will cause the importWaitgroup.Wait() at line 185 to hang indefinitely. When the fallback succeeds and returns at line 177, or when it fails and returns at line 173, the WaitGroup counter is never decremented. This will cause a deadlock. The wg.Done() call should be executed before any return statement, or deferred at the beginning of the goroutine.

Copilot uses AI. Check for mistakes.
if len(allErrs) > 0 {
// Go 1.20+: return errors.Join(allErrs...)
// Compatible fallback:
return fmt.Errorf("image import encountered %d error(s): %v", len(allErrs), allErrs)
Copy link

Copilot AI Dec 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message format "image import encountered %d error(s): %v" will print the entire error slice in Go's default format (e.g., [error1 error2]), which may not be user-friendly. Consider formatting each error on a separate line or using a more structured approach to present multiple errors, similar to how errors.Join would format them.

Suggested change
return fmt.Errorf("image import encountered %d error(s): %v", len(allErrs), allErrs)
var sb strings.Builder
for i, err := range allErrs {
fmt.Fprintf(&sb, " %d. %v\n", i+1, err)
}
return fmt.Errorf("image import encountered %d error(s):\n%s", len(allErrs), sb.String())

Copilot uses AI. Check for mistakes.
errs = append(errs, e)
}
if len(errs) > 0 {
return fmt.Errorf("pull-based import had %d error(s): %v", len(errs), errs)
Copy link

Copilot AI Dec 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message format "pull-based import had %d error(s): %v" will print the entire error slice in Go's default format (e.g., [error1 error2]), which may not be user-friendly. Consider formatting each error on a separate line or using a more structured approach to present multiple errors.

Copilot uses AI. Check for mistakes.
Comment on lines 162 to 163
if err := runtime.ExecInNode(ctx, node, []string{"ctr", "image", "import", "--all-platforms", tarPath}); err != nil {
l.Log().Errorf("failed to import images in node '%s': %v", node.Name, err)
Copy link

Copilot AI Dec 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When ctr image import fails but the error is not a missing digest error (or when FallbackPull is false, or when there are no runtime images), the error is logged but wg.Done() is never called. This will cause the importWaitgroup.Wait() at line 185 to hang indefinitely. The wg.Done() should be called in all error paths, not just the success path. Consider using defer wg.Done() at the beginning of the goroutine to ensure it's always called.

Copilot uses AI. Check for mistakes.
go func(n *k3d.Node, imageRef string) {
defer wg.Done()

// Use ctr; fully-qualified refs are safest
Copy link

Copilot AI Dec 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says "Use ctr" but the code actually uses "crictl". The comment should be updated to match the implementation, or vice versa. Note that crictl and ctr are different tools with different behaviors - crictl is the CRI (Container Runtime Interface) CLI, while ctr is the containerd CLI.

Suggested change
// Use ctr; fully-qualified refs are safest
// Use crictl; fully-qualified refs are safest

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +45
func importWithPull(ctx context.Context, runtime runtimes.Runtime, cluster *k3d.Cluster, images []string) error {
var wg sync.WaitGroup
errCh := make(chan error, len(images)*len(cluster.Nodes))

for _, node := range cluster.Nodes {
if node.Role != k3d.ServerRole && node.Role != k3d.AgentRole {
continue
}
for _, img := range images {
wg.Add(1)
go func(n *k3d.Node, imageRef string) {
defer wg.Done()

// Use ctr; fully-qualified refs are safest
cmd := []string{"crictl", "pull", imageRef}
if err := runtime.ExecInNode(ctx, n, cmd); err != nil {
errCh <- fmt.Errorf("node %s: pull %s failed: %w", n.Name, imageRef, err)
}
}(node, img)
}
}

wg.Wait()
close(errCh)

var errs []error
for e := range errCh {
errs = append(errs, e)
}
if len(errs) > 0 {
return fmt.Errorf("pull-based import had %d error(s): %v", len(errs), errs)
}
return nil
}
Copy link

Copilot AI Dec 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new importWithPull function lacks test coverage. Given that the repository has test coverage for similar functions in tools_test.go (e.g., Test_findImages, Test_findRuntimeImage), this new function should also have corresponding tests to verify its behavior, especially error handling and the concurrent pull operations across multiple nodes.

Copilot uses AI. Check for mistakes.
Comment on lines +106 to +112
func isMissingDigestErr(err error) bool {
if err == nil {
return false
}
s := err.Error()
return strings.Contains(s, "content digest sha256:") && strings.Contains(s, "not found")
}
Copy link

Copilot AI Dec 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new isMissingDigestErr helper function lacks test coverage. Given that the repository has test coverage for functions in tools_test.go, this function should have tests to verify it correctly identifies missing digest errors and handles edge cases (e.g., nil errors, errors with partial matches to the pattern, etc.).

Copilot uses AI. Check for mistakes.
Comment on lines +204 to +212
var importErrs []error
for e := range errCh {
importErrs = append(importErrs, e)
}
allErrs := append(copyErrs, importErrs...)
if len(allErrs) > 0 {
// Go 1.20+: return errors.Join(allErrs...)
// Compatible fallback:
return fmt.Errorf("image import encountered %d error(s): %v", len(allErrs), allErrs)
Copy link

Copilot AI Dec 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error collection pattern (creating a buffered channel, collecting errors in goroutines, waiting, closing the channel, and draining it into a slice) is duplicated between this function and importWithPull in pull.go. Consider extracting this pattern into a shared helper function to reduce code duplication and improve maintainability.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant