Skip to content

Commit 6b640d9

Browse files
ravwojdyla-agentravwojdylaclaude
authored
tokenize: fix Artifact.load -> Artifact.from_path rename misses (#5756)
## Summary - PR #5727 renamed `Artifact.load` → `Artifact.from_path` in `marin.execution.artifact` but left three call sites in `lib/marin/src/marin/processing/tokenize/` calling the now-missing method. - Both `tokenize_attributes_step` and `build_levanter_store_step` blow up at runtime with `AttributeError: type object 'Artifact' has no attribute 'load'` the first time their `_fn` closure runs (i.e. inside the StepRunner, not at module import). - Rename the 3 call sites + 2 stale docstring references to match. ## Test plan - [x] `uv run pytest tests/processing/tokenize/test_split_tokenize.py` — 18 passed (existing tests, no new ones). - [x] `uv run pyrefly check` — clean. - [x] `./infra/pre-commit.py --files ...` — clean. - [x] Surfaced while running `experiments/tokenize/all_sources_tokenize.py` on Iris — every step hit the AttributeError immediately. Note: existing unit tests don't catch this because they exercise `tokenize_attributes` / `build_levanter_store` directly, not the `*_step` factory closures. Co-authored-by: Rafal Wojdyla <ravwojdyla@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent f8b87e5 commit 6b640d9

2 files changed

Lines changed: 5 additions & 5 deletions

File tree

lib/marin/src/marin/processing/tokenize/attributes.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ class TokenizedAttrData(BaseModel):
5555
and co-partitioned with the source — both datakit invariants.
5656
5757
Persisted as the step's ``.artifact``. Load via
58-
``Artifact.load(step, TokenizedAttrData)``.
58+
``Artifact.from_path(step, TokenizedAttrData)``.
5959
6060
Attributes:
6161
version: Schema version.
@@ -281,9 +281,9 @@ def _fn(output_path: str) -> TokenizedAttrData:
281281
"max_workers": max_workers,
282282
}
283283
if train_normalize is not None:
284-
kwargs["train_source"] = Artifact.load(train_normalize, NormalizedData)
284+
kwargs["train_source"] = Artifact.from_path(train_normalize, NormalizedData)
285285
if validation_normalize is not None:
286-
kwargs["validation_source"] = Artifact.load(validation_normalize, NormalizedData)
286+
kwargs["validation_source"] = Artifact.from_path(validation_normalize, NormalizedData)
287287
if worker_resources is not None:
288288
kwargs["worker_resources"] = worker_resources
289289
return tokenize_attributes(TokenizeAttributesConfig(**kwargs))

lib/marin/src/marin/processing/tokenize/store_builder.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ class LevanterStoreData(BaseModel):
6767
"""Outcome of :func:`build_levanter_store`: a Levanter cache per split.
6868
6969
Persisted as the step's ``.artifact``. Load via
70-
``Artifact.load(step, LevanterStoreData)``.
70+
``Artifact.from_path(step, LevanterStoreData)``.
7171
7272
Attributes:
7373
version: Schema version.
@@ -354,7 +354,7 @@ def build_levanter_store_step(
354354

355355
def _fn(output_path: str) -> LevanterStoreData:
356356
kwargs: dict = {
357-
"sources": [Artifact.load(s, TokenizedAttrData) for s in tokenize_steps],
357+
"sources": [Artifact.from_path(s, TokenizedAttrData) for s in tokenize_steps],
358358
"cache_path": output_path,
359359
"max_workers": max_workers,
360360
"levanter_batch_size": levanter_batch_size,

0 commit comments

Comments
 (0)