Skip to content

marin: artifact.from_path SUCCESS-marker fallback; pydantic-only typed load#5732

Merged
ravwojdyla merged 3 commits into
mainfrom
worktree-rav-artifact-load-no-artifact-file
May 14, 2026
Merged

marin: artifact.from_path SUCCESS-marker fallback; pydantic-only typed load#5732
ravwojdyla merged 3 commits into
mainfrom
worktree-rav-artifact-load-no-artifact-file

Conversation

@ravwojdyla-agent
Copy link
Copy Markdown
Contributor

@ravwojdyla-agent ravwojdyla-agent commented May 14, 2026

  • Artifact.from_path returns PathMetadata(path=base_path) when .artifact is missing but .executor_status reads SUCCESS
  • fallback applies for untyped calls and for artifact_type=PathMetadata; any other artifact_type still raises FileNotFoundError
  • artifact_type must now be a pydantic.BaseModel subclass — the dataclass branch in typed load is gone, replaced by a TypeError for non-BaseModel types
  • convert PathMetadata from @dataclass to pydantic.BaseModel so it can flow through the typed code path
  • reuse STATUS_SUCCESS / get_status_path from executor_step_status instead of redefining the marker
  • Artifact.save still accepts dataclasses (round-trip via untyped from_path returns a dict, as before)
  • all production call sites already pass pydantic types (NormalizedData, MinHashAttrData, FuzzyDupsAttrData); test types TokenizeMetadata / TrainMetadata converted to BaseModel
  • tests in tests/execution/test_step_runner.py cover untyped, typed PathMetadata, typed non-PathMetadata, and non-SUCCESS status paths

@ravwojdyla-agent ravwojdyla-agent added the agent-generated Created by automation/agent label May 14, 2026
@ravwojdyla-agent ravwojdyla-agent changed the title marin: artifact.from_path falls back to .executor_status SUCCESS marker marin: artifact.from_path SUCCESS-marker fallback; pydantic-only typed load May 14, 2026
@ravwojdyla ravwojdyla requested a review from rjpower May 14, 2026 03:19
If ``base_path`` is a relative path (no URL scheme, doesn't start with ``/``),
it is resolved against ``marin_prefix()``.

If ``base_path`` has no ``.artifact`` file but its ``.executor_status`` file
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit should .artifact be artifact.json so it's obvious if someone is looking at the directory

ravwojdyla and others added 3 commits May 14, 2026 09:38
When `.artifact` is absent but `.executor_status` reads `SUCCESS`,
synthesize a `PathMetadata` pointing at the step's output dir so legacy
steps that publish only the status marker are still loadable. The
fallback applies for untyped calls and for callers asking specifically
for `PathMetadata`; other types still raise. `PathMetadata` is now a
pydantic model so it can be returned through the typed code path.
Drops the dataclass branch from typed loading — `artifact_type` must now
be a pydantic BaseModel subclass, otherwise `from_path` raises TypeError.
All production call sites already pass pydantic types
(`NormalizedData`, `MinHashAttrData`, `FuzzyDupsAttrData`); test types
`TokenizeMetadata` / `TrainMetadata` are converted to BaseModel. Save
still accepts dataclasses since they round-trip fine through untyped
`from_path`.
Addresses #5732 (comment).
`Artifact.save` now writes `artifact.json`; `from_path` reads it first and
falls back to the legacy `.artifact` dotfile so historical GCS outputs remain
loadable. The fallback can be deleted once those prefixes are gone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ravwojdyla ravwojdyla force-pushed the worktree-rav-artifact-load-no-artifact-file branch from 7db0bf8 to 946ab06 Compare May 14, 2026 19:04
@ravwojdyla ravwojdyla merged commit 592ca9a into main May 14, 2026
29 checks passed
@ravwojdyla ravwojdyla deleted the worktree-rav-artifact-load-no-artifact-file branch May 14, 2026 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-generated Created by automation/agent

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants