marin-experiments

Copy-paste templates for running Marin pipelines as standalone experiments. Each template is a self-contained directory — marin is pulled in as a library via find-links wheels, no submodule, no vendoring.

Getting started

1. Pick a template

Template	Input	Pipeline
`tiny-stories/`	HF text dataset	download → tokenize → train
`speech-asr/`	HF audio dataset	download → Mimi-encode → train BPE → tokenize → train

Start with tiny-stories/ if your data is text. Start with speech-asr/ if you need a pre-tokenization stage (audio, images, anything that needs to become discrete tokens before training).

2. Copy the directory

cp -r tiny-stories my-experiment
cd my-experiment

Each template has its own pyproject.toml and virtual environment — nothing cross-references the source directory.

3. Adapt

Every template is driven by one launch.py that wires ExecutorSteps together. The per-template README walks through each stage and calls out what to change:

Data: swap the HF dataset ID + revision at the top of launch.py.
Model: resize TINY_MODEL / SPEECH_MODEL (hidden_dim, num_layers, num_heads, max_seq_len).
Tokenizer: swap MARIN_TOKENIZER, or (for speech-asr) change the BPE vocab size / special tokens.

4. Run locally on CPU

Every template supports a CPU smoke test that exercises the full pipeline end-to-end on a tiny subset — enough to confirm download → tokenize → train → checkpoint works before committing compute.

ACCELERATOR=cpu MARIN_PREFIX=/tmp/marin uv run python launch.py

Finishes in under a minute for tiny-stories, ~3 min for speech-asr (Mimi on CPU dominates).

5. Scale up on the shared marin cluster

Once the smoke test passes, submit the same launch.py to the shared marin TPU cluster via iris:

uv run iris --cluster=marin job run python launch.py --region=europe-west4

--cluster=marin targets the shared coordinator. --region is required because TPU availability is region-scoped and the default us-central1 has no v6e-4 capacity.

BYO cluster

If you don't have access to the shared marin cluster, you can run your own iris cluster — see the iris docs for setup.

Troubleshooting

`uv` fails with a 404 downloading a `marin-*` wheel

x Failed to download `marin-iris==0.99.devYYYYMMDD`
`-> HTTP status client error (404 Not Found) for url
    (https://github.com/marin-community/marin/releases/download/marin-iris-latest/...)

The marin-* wheels are published to rolling GitHub releases whose assets are replaced on each upstream rebuild, so a committed uv.lock eventually points at wheels that no longer exist. Repin against the current wheels:

uv lock --upgrade

A scheduled workflow (repin-lockfiles.yml) keeps the locks in this repo fresh, but if you copied a template into your own repo a while ago you'll need to repin yourself.

Repo layout

README.md            # this file
AGENTS.md            # repo-level guidance for Claude / other agents
tiny-stories/        # text template
speech-asr/          # audio template
submodules/marin/    # marin source (for local iris config; not imported)

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.agents/docs		.agents/docs
.github		.github
speech-asr		speech-asr
tiny-stories		tiny-stories
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

marin-experiments

Getting started

1. Pick a template

2. Copy the directory

3. Adapt

4. Run locally on CPU

5. Scale up on the shared marin cluster

BYO cluster

Troubleshooting

`uv` fails with a 404 downloading a `marin-*` wheel

Repo layout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

marin-experiments

Getting started

1. Pick a template

2. Copy the directory

3. Adapt

4. Run locally on CPU

5. Scale up on the shared marin cluster

BYO cluster

Troubleshooting

uv fails with a 404 downloading a marin-* wheel

Repo layout

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`uv` fails with a 404 downloading a `marin-*` wheel

Packages