Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buildah build: use the same overlay for the context directory for the whole build #5975

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 12 additions & 3 deletions imagebuildah/build.go
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,15 @@ func BuildDockerfiles(ctx context.Context, store storage.Store, options define.B
options.AdditionalBuildContexts = make(map[string]*define.AdditionalBuildContext)
}

contextDirectory, processLabel, mountLabel, usingContextOverlay, cleanupOverlay, err := platformSetupContextDirectoryOverlay(store, &options)
if err != nil {
return "", nil, fmt.Errorf("mounting an overlay over build context directory: %w", err)
}
defer cleanupOverlay()
if contextDirectory != "" {
options.ContextDirectory = contextDirectory
}

if len(options.Platforms) == 0 {
options.Platforms = append(options.Platforms, struct{ OS, Arch, Variant string }{
OS: options.SystemContext.OSChoice,
Expand Down Expand Up @@ -278,7 +287,7 @@ func BuildDockerfiles(ctx context.Context, store storage.Store, options define.B
platformOptions.ReportWriter = reporter
platformOptions.Err = stderr
}
thisID, thisRef, err := buildDockerfilesOnce(ctx, store, loggerPerPlatform, logPrefix, platformOptions, paths, files)
thisID, thisRef, err := buildDockerfilesOnce(ctx, store, loggerPerPlatform, logPrefix, platformOptions, paths, files, processLabel, mountLabel, usingContextOverlay)
if err != nil {
if errorContext := strings.TrimSpace(logPrefix); errorContext != "" {
return fmt.Errorf("%s: %w", errorContext, err)
Expand Down Expand Up @@ -389,7 +398,7 @@ func BuildDockerfiles(ctx context.Context, store storage.Store, options define.B
return id, ref, nil
}

func buildDockerfilesOnce(ctx context.Context, store storage.Store, logger *logrus.Logger, logPrefix string, options define.BuildOptions, containerFiles []string, dockerfilecontents [][]byte) (string, reference.Canonical, error) {
func buildDockerfilesOnce(ctx context.Context, store storage.Store, logger *logrus.Logger, logPrefix string, options define.BuildOptions, containerFiles []string, dockerfilecontents [][]byte, processLabel, mountLabel string, usingContextOverlay bool) (string, reference.Canonical, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit and not a blocker probably a todo for future: probably buildDockerfilesOnce should start consuming a struct.

mainNode, err := imagebuilder.ParseDockerfile(bytes.NewReader(dockerfilecontents[0]))
if err != nil {
return "", nil, fmt.Errorf("parsing main Dockerfile: %s: %w", containerFiles[0], err)
Expand Down Expand Up @@ -441,7 +450,7 @@ func buildDockerfilesOnce(ctx context.Context, store storage.Store, logger *logr
mainNode.Children = append(mainNode.Children, additionalNode.Children...)
}

exec, err := newExecutor(logger, logPrefix, store, options, mainNode, containerFiles)
exec, err := newExecutor(logger, logPrefix, store, options, mainNode, containerFiles, processLabel, mountLabel, usingContextOverlay)
if err != nil {
return "", nil, fmt.Errorf("creating build executor: %w", err)
}
Expand Down
86 changes: 86 additions & 0 deletions imagebuildah/build_linux.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
package imagebuildah

import (
"errors"
"fmt"
"io/fs"
"os"
"path/filepath"
"slices"

"github.com/containers/buildah/define"
"github.com/containers/buildah/internal/tmpdir"
"github.com/containers/buildah/pkg/overlay"
"github.com/containers/storage"
"github.com/opencontainers/selinux/go-selinux/label"
"github.com/sirupsen/logrus"
"golang.org/x/sys/unix"
)

// platformSetupContextDirectoryOverlay() sets up an overlay _over_ the build
// context directory, and sorts out labeling. Returns the location which
// should be used as the default build context; the process label and mount
// label for the build, if any; a boolean value that indicates whether we did,
// in fact, mount an overlay; and a cleanup function which should be called
// when the location is no longer needed (on success). Returned errors should
// be treated as fatal.
func platformSetupContextDirectoryOverlay(store storage.Store, options *define.BuildOptions) (string, string, string, bool, func(), error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return values are a bit unwieldy, maybe a struct instead? Although I guess the single caller unpacks them anyways, so doesn't matter. Yeah, one caller, so makes sense as is.

var succeeded bool
var tmpDir, contentDir string
cleanup := func() {
if contentDir != "" {
if err := overlay.CleanupContent(tmpDir); err != nil {
logrus.Debugf("cleaning up overlay scaffolding for build context directory: %v", err)
}
}
if tmpDir != "" {
if err := os.Remove(tmpDir); err != nil && !errors.Is(err, fs.ErrNotExist) {
logrus.Debugf("removing should-be-empty temporary directory %q: %v", tmpDir, err)
}
}
}
defer func() {
if !succeeded {
cleanup()
}
}()
// double-check that the context directory location is an absolute path
contextDirectoryAbsolute, err := filepath.Abs(options.ContextDirectory)
if err != nil {
return "", "", "", false, nil, fmt.Errorf("determining absolute path of %q: %w", options.ContextDirectory, err)
}
var st unix.Stat_t
if err := unix.Stat(contextDirectoryAbsolute, &st); err != nil {
return "", "", "", false, nil, fmt.Errorf("checking ownership of %q: %w", options.ContextDirectory, err)
}
// figure out the labeling situation
processLabel, mountLabel, err := label.InitLabels(options.CommonBuildOpts.LabelOpts)
if err != nil {
return "", "", "", false, nil, err
}
// create a temporary directory
tmpDir, err = os.MkdirTemp(tmpdir.GetTempDir(), "buildah-context-")
if err != nil {
return "", "", "", false, nil, fmt.Errorf("creating temporary directory: %w", err)
}
// create the scaffolding for an overlay mount under it
contentDir, err = overlay.TempDir(tmpDir, 0, 0)
if err != nil {
return "", "", "", false, nil, fmt.Errorf("creating overlay scaffolding for build context directory: %w", err)
}
// mount an overlay that uses it as a lower
overlayOptions := overlay.Options{
GraphOpts: slices.Clone(store.GraphOptions()),
ForceMount: true,
MountLabel: mountLabel,
}
targetDir := filepath.Join(contentDir, "target")
contextDirMountSpec, err := overlay.MountWithOptions(contentDir, contextDirectoryAbsolute, targetDir, &overlayOptions)
if err != nil {
return "", "", "", false, nil, fmt.Errorf("creating overlay scaffolding for build context directory: %w", err)
}
// going forward, pretend that the merged directory is the actual context directory
logrus.Debugf("mounted an overlay at %q over %q", contextDirMountSpec.Source, contextDirectoryAbsolute)
succeeded = true
return contextDirMountSpec.Source, processLabel, mountLabel, true, cleanup, nil
}
20 changes: 20 additions & 0 deletions imagebuildah/build_notlinux.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
//go:build !linux

package imagebuildah

import (
"github.com/containers/buildah/define"
"github.com/containers/storage"
)

// platformSetupContextDirectoryOverlay() should set up an overlay _over_ the
// build context directory, and sort out labeling. Should return the location
// which should be used as the default build context; the process label and
// mount label for the build, if any; a boolean value that indicates whether we
// did, in fact, mount an overlay; and a cleanup function which should be
// called when the location is no longer needed (on success). Returned errors
// should be treated as fatal.
// TODO: currenty a no-op on this platform.
func platformSetupContextDirectoryOverlay(store storage.Store, options *define.BuildOptions) (string, string, string, bool, func(), error) {
return options.ContextDirectory, "", "", false, func() {}, nil
}
6 changes: 5 additions & 1 deletion imagebuildah/executor.go
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ type Executor struct {
stages map[string]*StageExecutor
store storage.Store
contextDir string
contextDirWritesAreDiscarded bool
pullPolicy define.PullPolicy
registry string
ignoreUnrecognizedInstructions bool
Expand Down Expand Up @@ -173,7 +174,7 @@ type imageTypeAndHistoryAndDiffIDs struct {
}

// newExecutor creates a new instance of the imagebuilder.Executor interface.
func newExecutor(logger *logrus.Logger, logPrefix string, store storage.Store, options define.BuildOptions, mainNode *parser.Node, containerFiles []string) (*Executor, error) {
func newExecutor(logger *logrus.Logger, logPrefix string, store storage.Store, options define.BuildOptions, mainNode *parser.Node, containerFiles []string, processLabel, mountLabel string, contextWritesDiscarded bool) (*Executor, error) {
defaultContainerConfig, err := config.Default()
if err != nil {
return nil, fmt.Errorf("failed to get container config: %w", err)
Expand Down Expand Up @@ -238,6 +239,7 @@ func newExecutor(logger *logrus.Logger, logPrefix string, store storage.Store, o
stages: make(map[string]*StageExecutor),
store: store,
contextDir: options.ContextDirectory,
contextDirWritesAreDiscarded: contextWritesDiscarded,
excludes: excludes,
groupAdd: options.GroupAdd,
ignoreFile: options.IgnoreFile,
Expand Down Expand Up @@ -273,6 +275,8 @@ func newExecutor(logger *logrus.Logger, logPrefix string, store storage.Store, o
squash: options.Squash,
labels: slices.Clone(options.Labels),
layerLabels: slices.Clone(options.LayerLabels),
processLabel: processLabel,
mountLabel: mountLabel,
annotations: slices.Clone(options.Annotations),
layers: options.Layers,
noHostname: options.CommonBuildOpts.NoHostname,
Expand Down
60 changes: 46 additions & 14 deletions imagebuildah/stage_executor.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ import (
"github.com/containers/buildah/define"
buildahdocker "github.com/containers/buildah/docker"
"github.com/containers/buildah/internal"
"github.com/containers/buildah/internal/sanitize"
"github.com/containers/buildah/internal/tmpdir"
internalUtil "github.com/containers/buildah/internal/util"
"github.com/containers/buildah/pkg/parse"
Expand Down Expand Up @@ -537,7 +538,7 @@ func (s *StageExecutor) performCopy(excludes []string, copies ...imagebuilder.Co
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not commenting on the content of this commit) there's a typo in the commit subject

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dang it, that's embarrassing. Fixing.

contextDir = mountPoint
}
// Original behaviour of buildah still stays true for COPY irrespective of additional context.
// This isn't --from the build context directory, so we don't want to force everything to 0:0
preserveOwnership = true
copyExcludes = excludes
} else {
Expand Down Expand Up @@ -606,9 +607,14 @@ func (s *StageExecutor) performCopy(excludes []string, copies ...imagebuilder.Co
}

// Returns a map of StageName/ImageName:internal.StageMountDetails for the
// items in the passed-in mounts list which include a "from=" value.
// items in the passed-in mounts list which include a "from=" value. The ""
// key in the returned map corresponds to the default build context.
func (s *StageExecutor) runStageMountPoints(mountList []string) (map[string]internal.StageMountDetails, error) {
stageMountPoints := make(map[string]internal.StageMountDetails)
stageMountPoints[""] = internal.StageMountDetails{
MountPoint: s.executor.contextDir,
IsWritesDiscardedOverlay: s.executor.contextDirWritesAreDiscarded,
}
for _, flag := range mountList {
if strings.Contains(flag, "from") {
tokens := strings.Split(flag, ",")
Expand Down Expand Up @@ -638,7 +644,7 @@ func (s *StageExecutor) runStageMountPoints(mountList []string) (map[string]inte
if additionalBuildContext.IsImage {
mountPoint, err := s.getImageRootfs(s.ctx, additionalBuildContext.Value)
if err != nil {
return nil, fmt.Errorf("%s from=%s: image found with that name", flag, from)
return nil, fmt.Errorf("%s from=%s: image not found with that name", flag, from)
}
// The `from` in stageMountPoints should point
// to `mountPoint` replaced from additional
Expand Down Expand Up @@ -922,6 +928,29 @@ func (s *StageExecutor) UnrecognizedInstruction(step *imagebuilder.Step) error {
return errors.New(err)
}

// sanitizeFrom limits which image transports we'll accept. For those it
// accepts which refer to filesystem objects, where relative path names are
// evaluated relative to "contextDir", it will create a copy of the original
// image, under "tmpdir", which contains no symbolic links, and return either
// the original image reference or a reference to a sanitized copy which should
// be used instead.
func (s *StageExecutor) sanitizeFrom(from, tmpdir string) (newFrom string, err error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we document what is value contained in newFrom string ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding.

transportName, restOfImageName, maybeHasTransportName := strings.Cut(from, ":")
if !maybeHasTransportName || transports.Get(transportName) == nil {
if _, err = reference.ParseNormalizedNamed(from); err == nil {
// this is a normal-looking image-in-a-registry-or-named-in-storage name
return from, nil
}
if img, err := s.executor.store.Image(from); img != nil && err == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… is it expected that Containerfiles can refer to image IDs? That can potentially allow builds on a shared host to refer to images which the invoking user has no right to pull, if the invoking user can somehow determine the config digest.

OTOH… I can imagine that there might be uses.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internally, we rewrite references to nicknames of previous stages to their IDs before we get to this point, since they don't get names assigned to them.

// this is an image ID
return from, nil
}
return "", fmt.Errorf("parsing image name %q: %w", from, err)
}
// TODO: drop this part and just return an error... someday
return sanitize.ImageName(transportName, restOfImageName, s.executor.contextDir, tmpdir)
}

// prepare creates a working container based on the specified image, or if one
// isn't specified, the first argument passed to the first FROM instruction we
// can find in the stage's parsed tree.
Expand All @@ -938,6 +967,19 @@ func (s *StageExecutor) prepare(ctx context.Context, from string, initializeIBCo
}
from = base
}
sanitizedDir, err := os.MkdirTemp(tmpdir.GetTempDir(), "buildah-context-")
if err != nil {
return nil, fmt.Errorf("creating temporary directory: %w", err)
}
defer func() {
if err := os.RemoveAll(sanitizedDir); err != nil {
logrus.Warn(err)
}
}()
sanitizedFrom, err := s.sanitizeFrom(from, tmpdir.GetTempDir())
if err != nil {
return nil, fmt.Errorf("invalid base image specification %q: %w", from, err)
}
displayFrom := from
if ib.Platform != "" {
displayFrom = "--platform=" + ib.Platform + " " + displayFrom
Expand Down Expand Up @@ -976,7 +1018,7 @@ func (s *StageExecutor) prepare(ctx context.Context, from string, initializeIBCo

builderOptions := buildah.BuilderOptions{
Args: ib.Args,
FromImage: from,
FromImage: sanitizedFrom,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small question since I did not understand it completely, why are are we creating copy of entire image for every stage ? Is it just to make sure no symbolic links are present ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed here #5975 (comment) and again here #5975 (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be fair to ask and see about how much overhead does this patch adds both disk usage wise and speed time.

I could be wrong but I could think of use-case where a stage copied lot of external artifacts but only parts of it are being used in further stages, this will end up creating copies for every future stage even though those parts are not being used by future stages.

Should we consider making this behavior optional ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be fair to ask and see about how much overhead does this patch adds both disk usage wise and speed time.

The disk cost is a second copy of the base image being copied to a temporary location.
The speed cost is the cost of copying that content, which is unavoidably higher if we're using fuse-overlayfs, and likewise reading contents from the build context through that overlay to avoid inconsistencies.

I could be wrong but I could think of use-case where a stage copied lot of external artifacts but only parts of it are being used in further stages, this will end up creating copies for every future stage even though those parts are not being used by future stages.

I don't understand what you're describing here. Can you elaborate on that? Initializing a stage using a previous stage's result as its base doesn't involve the build context directory, so I don't follow where additional copies would appear.

Should we consider making this behavior optional ?

Using an overlay over the build context directory is not something we should turn off. Turning off sharing like this PR introduces doesn't reduce any of the costs that are incurred when they're not shared, so I don't really see the benefit of adding complexity there.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you're describing here. Can you elaborate on that? Initializing a stage using a previous stage's result as its base doesn't involve the build context directory, so I don't follow where additional copies would appear.

I was thinking for use-case something like this

FROM alpine as first
COPY /large-fileA-from-host .
COPY /large-fileB-from-host .
COPY /large-fileC-from-host .


FROM first as second 
# Use only large-file-A in this stage
# but this will create another copy of `first` containing all three files

FROM first as third 
# Use only large-file-B in this stage
# # but this will create another copy of `first` containing all three files

FROM first as fourth 
# Use only large-file-C in this stage
# but this will create another copy of `first` containing all three files

@nalind If i understand correctly for all the three stages second, third and fourth a separate copy of first stage will be created , am i correct ? and I am not sure if buildah was doing this before.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stages nicknamed "second", "third", and "fourth" would each start with their own working container based on the image that was committed at the end of the stage nicknamed "first", as before. This PR doesn't affect how we handle the root filesystems of working containers, or committing images.

The COPY instructions have to read contents through the mounted overlay for consistency's sake, but I don't see additional copies of them being made here. Since they're not being modified, they'd be read directly from the build context directory that's being used as the "lowerdir" for the overlay, without being "copied up".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah i see sounds fair to me.

GroupAdd: s.executor.groupAdd,
PullPolicy: pullPolicy,
ContainerSuffix: s.executor.containerSuffix,
Expand Down Expand Up @@ -1014,16 +1056,6 @@ func (s *StageExecutor) prepare(ctx context.Context, from string, initializeIBCo
return nil, fmt.Errorf("creating build container: %w", err)
}

// If executor's ProcessLabel and MountLabel is empty means this is the first stage
// Make sure we share first stage's ProcessLabel and MountLabel with all other subsequent stages
// Doing this will ensure and one stage in same build can mount another stage even if `selinux`
// is enabled.

if s.executor.mountLabel == "" && s.executor.processLabel == "" {
s.executor.mountLabel = builder.MountLabel
s.executor.processLabel = builder.ProcessLabel
}

if initializeIBConfig {
volumes := map[string]struct{}{}
for _, v := range builder.Volumes() {
Expand Down
Loading