feat(wanda): implement artifact extraction from built images by andrew-anyscale · Pull Request #408 · ray-project/rayci

andrew-anyscale · 2026-02-03T21:24:27Z

Implement artifact extraction using Docker with optimized batching. All artifacts are extracted in a single docker run command for efficiency, using a generated shell script with volume mount.

Uses Docker (not Crane) because docker can extract specific files while Crane must export entire filesystem as tarball
Single container spawned regardless of artifact count
Shell expansion handles glob patterns natively
Optional artifacts use || echo fallback to allow failures
Extraction runs for root spec only, even on cache hit

Adds ArtifactsDir to ForgeConfig (defaults to /tmp/artifacts in RayCI mode).

Topic: wanda-artifact-extract
Relative: wanda-artifact-spec
Labels: draft

Signed-off-by: andrew andrew@anyscale.com

Add support for defining artifacts to extract from built container images. The Artifact struct specifies: - src: path inside container (supports globs and variable expansion) - dst: destination path on host (relative to artifacts dir) - optional: if true, extraction failure warns instead of failing build Topic: wanda-artifact-spec Labels: draft Signed-off-by: andrew <andrew@anyscale.com>

Implement artifact extraction using Docker with optimized batching. All artifacts are extracted in a single docker run command for efficiency, using a generated shell script with volume mount. - Uses Docker (not Crane) because docker can extract specific files while Crane must export entire filesystem as tarball - Single container spawned regardless of artifact count - Shell expansion handles glob patterns natively - Optional artifacts use || echo fallback to allow failures - Extraction runs for root spec only, even on cache hit Adds ArtifactsDir to ForgeConfig (defaults to /tmp/artifacts in RayCI mode). Topic: wanda-artifact-extract Relative: wanda-artifact-spec Labels: draft Signed-off-by: andrew <andrew@anyscale.com>

andrew-anyscale · 2026-02-03T21:24:30Z

Reviews in this chain:
└#407 feat(wanda): add Artifact struct to spec for extraction
　└#408 feat(wanda): implement artifact extraction from built images
　　└#409 feat(wanda): add artifacts_dir CLI flag
　　　└#410 feat(raycicmd): add artifacts_dir to wanda steps

andrew-anyscale · 2026-02-03T21:24:30Z

#	head	base	diff	date	summary
0	ae920eeb	7699d792	diff	Feb 3 13:24 PM	14 files changed, 852 insertions(+), 2 deletions(-)

gemini-code-assist · 2026-02-03T21:24:46Z

Summary of Changes

Hello @andrew-anyscale, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust and efficient mechanism for extracting artifacts from Docker images after they have been built. By utilizing a single Docker container execution with a dynamically generated shell script, it streamlines the process of retrieving specific files or patterns from the image's filesystem to a designated host directory. This enhancement provides greater control and flexibility for managing build outputs, ensuring that necessary components are readily available for subsequent steps in the development pipeline.

Highlights

Artifact Extraction: Implemented a new feature to extract specified artifacts from built Docker images using an optimized Docker run command.
Docker-based Extraction: The extraction process leverages Docker directly, generating a shell script to copy files from the image to a host-mounted volume, which is more efficient for specific file extraction compared to exporting entire filesystems.
Flexible Artifact Specification: The system supports glob patterns for source paths and allows marking artifacts as optional, preventing build failures if an optional file is not found.
Root Spec Only Extraction: Artifact extraction is performed exclusively for the root specification, even when dealing with cached builds or dependent images.
Configuration Update: A new ArtifactsDir field has been added to ForgeConfig to specify the destination directory for extracted artifacts.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

wanda/docker_cmd.go
- Added runExtract function to execute a shell script within a Docker container, facilitating artifact extraction.
wanda/docker_cmd_test.go
- Added TestDockerCmdRunExtract and TestDockerCmdRunExtract_bestEffort to verify the functionality of the new runExtract method, including glob pattern handling and optional file behavior.
wanda/extract.go
- New file introduced to house the core logic for building the shell script used in artifact extraction, including functions for path resolution (resolveDst), glob character detection (hasGlobChars), and shell quoting (shellQuote).
wanda/extract_test.go
- New file containing comprehensive unit tests for the artifact extraction script generation and path resolution logic, covering various scenarios like absolute/relative paths, globbing, optional artifacts, and path escape prevention.
wanda/forge.go
- Integrated artifact extraction into the main Build flow, ensuring that artifacts are extracted from the root spec if ArtifactsDir is configured.
- Added ExtractArtifacts method to the Forge struct, encapsulating the artifact extraction process.
wanda/forge_config.go
- Added ArtifactsDir field to the ForgeConfig struct to allow users to specify the output directory for extracted artifacts.
wanda/forge_test.go
- Added several integration tests (TestBuild_WithArtifacts_exact, TestBuild_WithArtifacts_glob, TestBuild_WithArtifacts_optional, TestBuild_WithArtifacts_rootOnly, TestBuild_WithArtifacts_cacheHit) to validate the end-to-end artifact extraction feature under different conditions.
wanda/testdata/Dockerfile.artifact
- New test Dockerfile to create a sample image with various files for artifact extraction testing.
wanda/testdata/Dockerfile.artifact-top
- New test Dockerfile used in multi-stage build scenarios for artifact extraction testing.
wanda/testdata/artifact-dep-base.wanda.yaml
- New test Wanda spec defining a base image with artifacts for dependency testing.
wanda/testdata/artifact-dep-top.wanda.yaml
- New test Wanda spec defining a top-level image that depends on artifact-dep-base, used to verify root-only artifact extraction.
wanda/testdata/artifact-exact.wanda.yaml
- New test Wanda spec demonstrating exact file artifact extraction.
wanda/testdata/artifact-glob.wanda.yaml
- New test Wanda spec demonstrating artifact extraction using glob patterns.
wanda/testdata/artifact-optional.wanda.yaml
- New test Wanda spec demonstrating the extraction of optional artifacts.

Activity

This pull request introduces a new feature for artifact extraction, complete with a dedicated extract.go file for core logic and extensive unit and integration tests to ensure correctness and robustness.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a feature to extract artifacts from built Docker images. The implementation uses a single docker run command with a generated shell script to handle batch extraction efficiently. The logic for resolving destination paths and handling glob patterns is well-contained in the new extract.go file, and it is accompanied by a comprehensive set of tests.

My review has identified a critical command injection vulnerability in the script generation logic. I've also included a few medium-severity suggestions to improve code portability in tests and documentation clarity. Overall, this is a solid implementation of a useful feature, pending the resolution of the security issue.

gemini-code-assist · 2026-02-03T21:26:03Z

wanda/extract.go

+		cmd := fmt.Sprintf("mkdir -p %s && cp -r %s %s",
+			shellQuote(mkdirPath),
+			srcParam,
+			shellQuote(dst),
+		)


There is a command injection vulnerability here. If a.Src contains glob characters, it is passed unquoted and unsanitized to the shell script. A malicious a.Src like *; rm -rf / could lead to arbitrary command execution inside the container.

Since a.Src comes from a YAML file, it should be treated as untrusted user input.

To fix this, you should validate a.Src to ensure it doesn't contain any shell metacharacters other than the intended glob patterns when hasGlobChars is true.

For example, you could add a validation function:

func isValidGlob(path string) bool { // This is a basic check. A more robust validation might be needed. if strings.ContainsAny(path, ";|&()`\n$") { return false } return true }

And then use it in buildExtractionScript before constructing the command.

gemini-code-assist · 2026-02-03T21:26:03Z

wanda/docker_cmd_test.go

+		t.Fatalf("runExtract: %v", err)
+	}
+
+	if _, err := os.Stat(tmpDir + "/etc/alpine-release"); os.IsNotExist(err) {


Using + to concatenate path components is not portable and can lead to issues on different operating systems. It's better to use filepath.Join to construct file paths.

Suggested change

if _, err := os.Stat(tmpDir + "/etc/alpine-release"); os.IsNotExist(err) {

if _, err := os.Stat(filepath.Join(tmpDir, "etc", "alpine-release")); os.IsNotExist(err) {

gemini-code-assist · 2026-02-03T21:26:04Z

wanda/docker_cmd_test.go

+		t.Error("alpine-release was not copied")
+	}
+
+	entries, err := os.ReadDir(tmpDir + "/etc")


For portability, please use filepath.Join to construct file paths instead of string concatenation.

Suggested change

entries, err := os.ReadDir(tmpDir + "/etc")

entries, err := os.ReadDir(filepath.Join(tmpDir, "etc"))

gemini-code-assist · 2026-02-03T21:26:04Z

wanda/docker_cmd_test.go

+		t.Fatalf("runExtract should not fail (best-effort): %v", err)
+	}
+
+	if _, err := os.Stat(tmpDir + "/etc/alpine-release"); os.IsNotExist(err) {


For portability, please use filepath.Join to construct file paths instead of string concatenation.

Suggested change

if _, err := os.Stat(tmpDir + "/etc/alpine-release"); os.IsNotExist(err) {

if _, err := os.Stat(filepath.Join(tmpDir, "etc", "alpine-release")); os.IsNotExist(err) {

gemini-code-assist · 2026-02-03T21:26:04Z

wanda/forge.go

+}
+
 // Build builds a container image from the given specification.
+// If the spec has artifacts defined, they will be extracted after build.


This comment is misleading. The (f *Forge) Build(spec *Spec) method only builds a single image and does not handle artifact extraction. The extraction logic is handled in the top-level Build function after all dependencies are built. I suggest removing this line to avoid confusion.

Suggested change

// If the spec has artifacts defined, they will be extracted after build.

// Build builds a container image from the given specification.

andrew-anyscale added 2 commits February 3, 2026 13:24

This was referenced Feb 3, 2026

feat(wanda): add Artifact struct to spec for extraction #407

Open

feat(wanda): add artifacts_dir CLI flag #409

Open

feat(raycicmd): set RAYCI_ARTIFACTS_DIR for wanda steps #410

Open

gemini-code-assist bot reviewed Feb 3, 2026

View reviewed changes

andrew-anyscale force-pushed the andrew/revup/main/wanda-artifact-spec branch from 7699d79 to 70e0db3 Compare February 3, 2026 22:18

andrew-anyscale closed this Feb 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(wanda): implement artifact extraction from built images#408

feat(wanda): implement artifact extraction from built images#408
andrew-anyscale wants to merge 2 commits intoandrew/revup/main/wanda-artifact-specfrom
andrew/revup/main/wanda-artifact-extract

andrew-anyscale commented Feb 3, 2026

Uh oh!

andrew-anyscale commented Feb 3, 2026

Uh oh!

andrew-anyscale commented Feb 3, 2026

Uh oh!

gemini-code-assist bot commented Feb 3, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 3, 2026

Uh oh!

gemini-code-assist bot Feb 3, 2026

Uh oh!

gemini-code-assist bot Feb 3, 2026

Uh oh!

gemini-code-assist bot Feb 3, 2026

Uh oh!

gemini-code-assist bot Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	if _, err := os.Stat(tmpDir + "/etc/alpine-release"); os.IsNotExist(err) {
	if _, err := os.Stat(filepath.Join(tmpDir, "etc", "alpine-release")); os.IsNotExist(err) {

	entries, err := os.ReadDir(tmpDir + "/etc")
	entries, err := os.ReadDir(filepath.Join(tmpDir, "etc"))

	// If the spec has artifacts defined, they will be extracted after build.
	// Build builds a container image from the given specification.

Conversation

andrew-anyscale commented Feb 3, 2026

Uh oh!

andrew-anyscale commented Feb 3, 2026

Uh oh!

andrew-anyscale commented Feb 3, 2026

Uh oh!

gemini-code-assist bot commented Feb 3, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant