feat(wanda): implement artifact extraction from built images#408
feat(wanda): implement artifact extraction from built images#408andrew-anyscale wants to merge 2 commits intoandrew/revup/main/wanda-artifact-specfrom
Conversation
Add support for defining artifacts to extract from built container images. The Artifact struct specifies: - src: path inside container (supports globs and variable expansion) - dst: destination path on host (relative to artifacts dir) - optional: if true, extraction failure warns instead of failing build Topic: wanda-artifact-spec Labels: draft Signed-off-by: andrew <andrew@anyscale.com>
Implement artifact extraction using Docker with optimized batching. All artifacts are extracted in a single docker run command for efficiency, using a generated shell script with volume mount. - Uses Docker (not Crane) because docker can extract specific files while Crane must export entire filesystem as tarball - Single container spawned regardless of artifact count - Shell expansion handles glob patterns natively - Optional artifacts use || echo fallback to allow failures - Extraction runs for root spec only, even on cache hit Adds ArtifactsDir to ForgeConfig (defaults to /tmp/artifacts in RayCI mode). Topic: wanda-artifact-extract Relative: wanda-artifact-spec Labels: draft Signed-off-by: andrew <andrew@anyscale.com>
Summary of ChangesHello @andrew-anyscale, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a robust and efficient mechanism for extracting artifacts from Docker images after they have been built. By utilizing a single Docker container execution with a dynamically generated shell script, it streamlines the process of retrieving specific files or patterns from the image's filesystem to a designated host directory. This enhancement provides greater control and flexibility for managing build outputs, ensuring that necessary components are readily available for subsequent steps in the development pipeline. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a feature to extract artifacts from built Docker images. The implementation uses a single docker run command with a generated shell script to handle batch extraction efficiently. The logic for resolving destination paths and handling glob patterns is well-contained in the new extract.go file, and it is accompanied by a comprehensive set of tests.
My review has identified a critical command injection vulnerability in the script generation logic. I've also included a few medium-severity suggestions to improve code portability in tests and documentation clarity. Overall, this is a solid implementation of a useful feature, pending the resolution of the security issue.
| cmd := fmt.Sprintf("mkdir -p %s && cp -r %s %s", | ||
| shellQuote(mkdirPath), | ||
| srcParam, | ||
| shellQuote(dst), | ||
| ) |
There was a problem hiding this comment.
There is a command injection vulnerability here. If a.Src contains glob characters, it is passed unquoted and unsanitized to the shell script. A malicious a.Src like *; rm -rf / could lead to arbitrary command execution inside the container.
Since a.Src comes from a YAML file, it should be treated as untrusted user input.
To fix this, you should validate a.Src to ensure it doesn't contain any shell metacharacters other than the intended glob patterns when hasGlobChars is true.
For example, you could add a validation function:
func isValidGlob(path string) bool {
// This is a basic check. A more robust validation might be needed.
if strings.ContainsAny(path, ";|&()`\n$") {
return false
}
return true
}And then use it in buildExtractionScript before constructing the command.
| t.Fatalf("runExtract: %v", err) | ||
| } | ||
|
|
||
| if _, err := os.Stat(tmpDir + "/etc/alpine-release"); os.IsNotExist(err) { |
There was a problem hiding this comment.
Using + to concatenate path components is not portable and can lead to issues on different operating systems. It's better to use filepath.Join to construct file paths.
| if _, err := os.Stat(tmpDir + "/etc/alpine-release"); os.IsNotExist(err) { | |
| if _, err := os.Stat(filepath.Join(tmpDir, "etc", "alpine-release")); os.IsNotExist(err) { |
| t.Error("alpine-release was not copied") | ||
| } | ||
|
|
||
| entries, err := os.ReadDir(tmpDir + "/etc") |
| t.Fatalf("runExtract should not fail (best-effort): %v", err) | ||
| } | ||
|
|
||
| if _, err := os.Stat(tmpDir + "/etc/alpine-release"); os.IsNotExist(err) { |
There was a problem hiding this comment.
| } | ||
|
|
||
| // Build builds a container image from the given specification. | ||
| // If the spec has artifacts defined, they will be extracted after build. |
There was a problem hiding this comment.
This comment is misleading. The (f *Forge) Build(spec *Spec) method only builds a single image and does not handle artifact extraction. The extraction logic is handled in the top-level Build function after all dependencies are built. I suggest removing this line to avoid confusion.
| // If the spec has artifacts defined, they will be extracted after build. | |
| // Build builds a container image from the given specification. |
7699d79 to
70e0db3
Compare
Implement artifact extraction using Docker with optimized batching. All artifacts are extracted in a single docker run command for efficiency, using a generated shell script with volume mount.
Adds ArtifactsDir to ForgeConfig (defaults to /tmp/artifacts in RayCI mode).
Topic: wanda-artifact-extract
Relative: wanda-artifact-spec
Labels: draft
Signed-off-by: andrew andrew@anyscale.com