-
Notifications
You must be signed in to change notification settings - Fork 13
imagepuller: cache individual layers #1895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
umoci pulls in libraries with MPL license, need to check whether we can allow this :/ |
charludo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, that's quite the overhaul! 🎉
Had already started looking at this prior to the license question / draft status. Re: MPL: seems we'd need to open-source the imagepuller itself?
| var layers []string | ||
| remoteLayers, err := remoteImg.Layers() | ||
| if err != nil { | ||
| return nil, fmt.Errorf("creating image: %w", err) | ||
| } | ||
| log.Info("Created image", "id", newImg.ID) | ||
|
|
||
| if err := s.Store.RemoveNames(newImg.ID, newImg.Names); err != nil { | ||
| return nil, fmt.Errorf("removing pre-existing image names: %w", err) | ||
| return nil, fmt.Errorf("listing layers: %w", err) | ||
| } | ||
| if err := s.Store.AddNames(newImg.ID, []string{r.ImageUrl}); err != nil { | ||
| return nil, fmt.Errorf("adding image url as image name: %w", err) | ||
| for i, l := range remoteLayers { | ||
| digest, err := l.Digest() | ||
| if err != nil { | ||
| return nil, fmt.Errorf("getting digest of layer %d: %w", i, err) | ||
| } | ||
| layers = append(layers, digest.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears to me that this might reasonably be moved into the storeAndVerifyLayers func, i.e. store layers, then return the list of layers. That would also make it easier to add some unittests to the existing ones for this function which are checking that all wanted layers are being returned, regardless of whether they were pulled just now or in a previous image's pull.
| } | ||
|
|
||
| func (s *ImagePullerService) storeAndVerifyLayers(log *slog.Logger, remoteImg gcr.Image) (string, error) { | ||
| func (s *ImagePullerService) storeAndVerifyLayers(log *slog.Logger, remoteImg gcr.Image) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is not really fitting anymore, since we don't do manual digest checks any longer.
Maybe this should even be moved into the new store package, as store.Store(gcr.Image) or store.StoreDeduplicated(...)?
| if err != nil { | ||
| return "", fmt.Errorf("reading layer %d: %w", idx, err) | ||
| } | ||
| // TODO(burgerdev): pass a context here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be missing something here, but it seems like we should quite easily be able to pass in the context from PullImage here?
| // While we could be using remoteLayer.Uncompressed(), the gcr implementation currently uses gzip | ||
| // from the stdlib, which causes a significant performance hit compared to pgzip. This is why we | ||
| // decompress here. | ||
| func getDecompressedReader(remoteLayer gcr.Layer) (io.ReadCloser, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd really appreciate some unittests for this package. Seems to me it's otherwise easy to get something wrong here in future iterations 😅
We can use the library and add an exception for it. |
Before this PR, the imagepuller was only caching full images. If two different images shared layers, these would be downloaded two times.
With this change, we're switching the underlying layer storage engine (from podman to umoci), such that we have finer control over the storage layout. This allows storing the layers in independent, content-addressed directories, which can then be reused when assembling images.
Note to developers:
compress/gzipinstead ofpgzipcauses a pull time regression north of 30%.