marin: artifact.from_path SUCCESS-marker fallback; pydantic-only typed load#5732
Merged
Merged
Conversation
rjpower
approved these changes
May 14, 2026
| If ``base_path`` is a relative path (no URL scheme, doesn't start with ``/``), | ||
| it is resolved against ``marin_prefix()``. | ||
|
|
||
| If ``base_path`` has no ``.artifact`` file but its ``.executor_status`` file |
Collaborator
There was a problem hiding this comment.
nit should .artifact be artifact.json so it's obvious if someone is looking at the directory
When `.artifact` is absent but `.executor_status` reads `SUCCESS`, synthesize a `PathMetadata` pointing at the step's output dir so legacy steps that publish only the status marker are still loadable. The fallback applies for untyped calls and for callers asking specifically for `PathMetadata`; other types still raise. `PathMetadata` is now a pydantic model so it can be returned through the typed code path.
Drops the dataclass branch from typed loading — `artifact_type` must now be a pydantic BaseModel subclass, otherwise `from_path` raises TypeError. All production call sites already pass pydantic types (`NormalizedData`, `MinHashAttrData`, `FuzzyDupsAttrData`); test types `TokenizeMetadata` / `TrainMetadata` are converted to BaseModel. Save still accepts dataclasses since they round-trip fine through untyped `from_path`.
Addresses #5732 (comment). `Artifact.save` now writes `artifact.json`; `from_path` reads it first and falls back to the legacy `.artifact` dotfile so historical GCS outputs remain loadable. The fallback can be deleted once those prefixes are gone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7db0bf8 to
946ab06
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Artifact.from_pathreturnsPathMetadata(path=base_path)when.artifactis missing but.executor_statusreadsSUCCESSartifact_type=PathMetadata; any otherartifact_typestill raisesFileNotFoundErrorartifact_typemust now be apydantic.BaseModelsubclass — the dataclass branch in typed load is gone, replaced by aTypeErrorfor non-BaseModel typesPathMetadatafrom@dataclasstopydantic.BaseModelso it can flow through the typed code pathSTATUS_SUCCESS/get_status_pathfromexecutor_step_statusinstead of redefining the markerArtifact.savestill accepts dataclasses (round-trip via untypedfrom_pathreturns a dict, as before)NormalizedData,MinHashAttrData,FuzzyDupsAttrData); test typesTokenizeMetadata/TrainMetadataconverted toBaseModeltests/execution/test_step_runner.pycover untyped, typedPathMetadata, typed non-PathMetadata, and non-SUCCESSstatus paths