Skip to content

feat(wanda): add Artifact struct to spec for extraction#407

Open
andrew-anyscale wants to merge 1 commit intomainfrom
andrew/revup/main/wanda-artifact-spec
Open

feat(wanda): add Artifact struct to spec for extraction#407
andrew-anyscale wants to merge 1 commit intomainfrom
andrew/revup/main/wanda-artifact-spec

Conversation

@andrew-anyscale
Copy link
Contributor

@andrew-anyscale andrew-anyscale commented Feb 3, 2026

Define Artifact struct for specifying files to extract from built images:

  • src: absolute path inside container (supports glob patterns like "/*.whl")
  • dst: relative path in artifacts directory
  • optional: if true, missing files warn instead of fail

Includes helper methods for path resolution and validation:

  • Validate(): ensures src is absolute, dst is relative, no path traversal
  • ResolveSrcs(): resolves glob patterns against container file list
  • ResolveDst(): resolves destination within artifacts directory
  • CopyIntoDir(): determines if copying into directory vs renaming

Topic: wanda-artifact-spec

Signed-off-by: andrew andrew@anyscale.com

@andrew-anyscale
Copy link
Contributor Author

andrew-anyscale commented Feb 3, 2026

Reviews in this chain:
#407 feat(wanda): add Artifact struct to spec for extraction
 └#411 feat(wanda): implement artifact extraction using docker cp
  └#409 feat(wanda): add artifacts_dir CLI flag
   └#410 feat(raycicmd): set RAYCI_ARTIFACTS_DIR for wanda steps

@andrew-anyscale
Copy link
Contributor Author

andrew-anyscale commented Feb 3, 2026

# head base diff date summary
0 7699d792 6165f028 diff Feb 3 13:24 PM 2 files changed, 154 insertions(+)
1 70e0db32 6165f028 diff Feb 3 14:18 PM 1 file changed, 1 insertion(+), 2 deletions(-)
2 25c63e1c 6165f028 diff Feb 3 14:24 PM 2 files changed, 6 insertions(+), 3 deletions(-)
3 f1104ff7 6165f028 diff Feb 4 14:22 PM 3 files changed, 264 insertions(+), 17 deletions(-)
4 313a1f83 6165f028 diff Feb 4 14:27 PM 0 files changed
5 7a60b1e6 6165f028 diff Feb 4 15:12 PM 1 file changed, 1 deletion(-)
6 928b35c4 6165f028 diff Feb 5 11:38 AM 2 files changed, 27 deletions(-)

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @andrew-anyscale, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust mechanism for defining and extracting artifacts from container images built using the wanda tool. It allows users to specify source paths within the container (with support for globs and variable expansion) and destination paths on the host, along with an option to make artifact extraction non-critical. This enhancement significantly improves the flexibility and utility of the wanda specification for managing build outputs.

Highlights

  • New Artifact Struct: A new Artifact struct has been introduced in wanda/spec.go to define how files or directories should be extracted from a built container image.
  • Artifact Properties: The Artifact struct includes Src (source path inside container, supporting globs and variable expansion), Dst (destination path on the host, relative to the artifacts directory), and Optional (a boolean indicating if extraction failure should only warn instead of failing the build).
  • Spec Integration: The main Spec struct in wanda/spec.go now includes an Artifacts field, which is a slice of Artifact pointers, allowing multiple artifacts to be defined for extraction.
  • Variable Expansion for Artifacts: A new helper function artifactsExpandVar was added to wanda/spec.go to handle variable expansion for the Src and Dst fields of Artifact objects.
  • Spec.expandVar Update: The expandVar method of the Spec struct has been updated to utilize the new artifactsExpandVar function, ensuring that variables within artifact definitions are correctly expanded.
  • Comprehensive Testing: wanda/spec_test.go has been extended with three new test cases: TestParseSpecFileWithArtifacts to validate parsing, TestSpecExpandWithArtifacts to verify variable expansion, and TestSpecMarshalLoopbackWithArtifacts to confirm correct YAML marshaling and unmarshaling of the new Artifact structure.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • wanda/spec.go
    • Defined a new Artifact struct to model container image artifacts, including Src, Dst, and Optional fields.
    • Added an Artifacts field (slice of *Artifact) to the Spec struct.
    • Implemented artifactsExpandVar for variable substitution within artifact paths.
    • Modified Spec.expandVar to process Artifacts using the new expansion logic.
  • wanda/spec_test.go
    • Added TestParseSpecFileWithArtifacts to validate the parsing of artifact definitions from YAML.
    • Included TestSpecExpandWithArtifacts to ensure correct variable expansion in artifact source and destination paths.
    • Introduced TestSpecMarshalLoopbackWithArtifacts to confirm the serialization and deserialization integrity of Artifact data.
Activity
  • The pull request was created as a draft by andrew-anyscale.
  • The author has signed off on the changes.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the Artifact struct to the wanda spec, allowing users to define files and directories for extraction from built container images. The changes include the Artifact struct definition, its integration into the Spec struct, and the necessary logic for variable expansion. Corresponding tests have been added to validate parsing, variable expansion, and serialization of specs containing artifacts. My review focuses on improving documentation clarity and test correctness.

@andrew-anyscale andrew-anyscale force-pushed the andrew/revup/main/wanda-artifact-spec branch from 7699d79 to 70e0db3 Compare February 3, 2026 22:18
@andrew-anyscale andrew-anyscale force-pushed the andrew/revup/main/wanda-artifact-spec branch 4 times, most recently from 313a1f8 to 7a60b1e Compare February 4, 2026 23:12
@andrew-anyscale andrew-anyscale marked this pull request as ready for review February 4, 2026 23:12
Comment on lines 63 to 65
func (a *Artifact) CopyIntoDir(numSrcs int) bool {
return strings.HasSuffix(a.Dst, "/") || numSrcs > 1
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe ShouldCopyIntoDir? it is still unclear what this function means?

this does not seem to be used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used in the subsequent PR -- https://github.com/ray-project/rayci/pull/411/changes#diff-c7fe292f56579652957100e992ec51957cb167f5787d0ac59d1171f7cf872a32R262

It can be a little challenging to split these PRs out to smaller review chunks, while still preserving general structure. I'll try to move this fn out to the next PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

// matchFiles returns files that match the artifact's glob pattern.
func (a *Artifact) matchFiles(files []string) []string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we like listing all the files first and then perform the matching? should we send in a fs interface or something? this can be a lot of files to match (potentially files of the whole container, which can be a lot?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/ray-project/rayci/pull/411/changes#diff-c7fe292f56579652957100e992ec51957cb167f5787d0ac59d1171f7cf872a32R233
^ This is the implementation detail. If we have no globs, we can call docker cp directly and avoid any filesystem parsing, which is the optimal case.

If we do have a glob, then we have to export the filesystem to get all filenames. https://github.com/ray-project/rayci/pull/411/changes#diff-7df0dd7dbd85029bba68703b7de4462bee6f7f6fe89a15f9286c6d09d1897323R160

Running on the wheel image today to extract the wheel glob, it takes <1s

# ci/docker/ray-wheel.wanda.yaml
artifacts:
  - src: "/*.whl"
    dst: ./
$ PYTHON_VERSION=3.13 MANYLINUX_VERSION=260128.221a193 HOSTTYPE=aarch64 ARCH_SUFFIX=-aarch64 BUILDKITE_COMMIT=b5737cefc0 IS_LOCAL_BUILD=true /Users/andrew/devel/rayci-wanda-artifacts/_release/wanda-darwin-arm64 --artifacts_dir .whl/ ci/docker/ray-wheel.wanda.yaml

2026/02/04 09:10:39 extracting 1 artifact(s) from localhost:5000/rayci-work:ray-wheel-py3.13-aarch64
Successfully copied 72MB to /Users/andrew/devel/ray-local-wheel-build/.whl/ray-3.0.0.dev0-cp313-cp313-manylinux2014_aarch64.whl
2026/02/04 09:10:40 extracted 1 artifact(s) in 722ms:
2026/02/04 09:10:40   /Users/andrew/devel/ray-local-wheel-build/.whl/ray-3.0.0.dev0-cp313-cp313-manylinux2014_aarch64.whl

// Artifact defines a file or directory to extract from a built image.
type Artifact struct {
// Src is the path inside the container to extract.
// Can be a file, directory, or glob pattern (e.g., "/*.whl").
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need glob pattern support? glob only matches file name, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back-and-forth on glob inclusion, and settled on it being worth inclusion. It drastically simplifies the Wheel upload case. Without this, we'd need to rename the wheel file away from schemas like ray-3.0.0.dev0-cp310-cp310-manylinux2014_x86_64.whl and into a generic name.

Yes, glob only matches file names currently.

Define Artifact struct for specifying files to extract from built images:
- src: absolute path inside container (supports glob patterns like "/*.whl")
- dst: relative path in artifacts directory
- optional: if true, missing files warn instead of fail

Includes helper methods for path resolution and validation:
- Validate(): ensures src is absolute, dst is relative, no path traversal
- ResolveSrcs(): resolves glob patterns against container file list
- ResolveDst(): resolves destination within artifacts directory
- CopyIntoDir(): determines if copying into directory vs renaming

Topic: wanda-artifact-spec

Signed-off-by: andrew <andrew@anyscale.com>
@andrew-anyscale andrew-anyscale force-pushed the andrew/revup/main/wanda-artifact-spec branch from 7a60b1e to 928b35c Compare February 5, 2026 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants