You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Description
Addresses feedback from Percy (see
#1981)
- more detailed documentation, comments
- eliminate forking from step 1; ask to do in step 3 at submission time
The Marin Speedrun, inspired by the [nanogpt Speedrun](https://github.com/KellerJordan/modded-nanogpt), is a benchmark aimed at improving the compute efficiency of language model training. This tutorial assumes you are familiar with the core premise of the Marin Speedrun—if not, check out [the overview of Marin Speedrun](../explanations/speedrun.md) for a more detailed explanation. Let's walk through how to submit your first speedrun to the leaderboard.
You can skip manually creating the fork if you have GitHub CLI installed and authenticated, which the script will use.
17
+
Note: this clones Marin for local development. When you want to contribute to Marin, you will need to convert to a fork and submit a PR (see Step 3 below). If you have GitHub CLI installed and authenticated, a fork will be created for you.
@@ -43,33 +43,195 @@ Create a subdirectory under <code>experiments/speedrun</code> and copy a starter
43
43
44
44
</details>
45
45
46
-
**2. Develop & Test Submission**
46
+
**2. Develop**
47
47
48
-
You can now work on your speedrun submission! You can check your code and your estimated compute cost using a dry run:
48
+
You can now work on your speedrun submission! The setup script has prepared the ["hackable transformer"](https://github.com/marin-community/marin/tree/main/experiments/hackable_transformer_starter_template.py) starter file for you. This is a self-contained file that contains the implementation of a transformer-based language model, and configurations for training it at four different sizes (130M, 300M, 520M, 1.2B parameters). Sections that require your attention are marked with TODOs.
49
+
50
+
You can check your code and your estimated compute cost using a dry run:
The rough estimated compute (calculated as (total model FLOPs / Assumed MFU)) for your run is probably between:
182
+
* 4.21e+18 FLOPs assuming an MFU of 0.5, and
183
+
* 1.05e+19 FLOPs assuming an MFU of 0.2.
184
+
185
+
This is calculated based on assumed MFU values and can be used as a rough estimate to guide your config/training setup.
186
+
Hardware and Model FLOPS Information:
187
+
Number of devices: 1
188
+
Number of chips: 1
189
+
Device FLOPs: 3.12e+14 FLOP/s
190
+
Total peak hardware FLOPs: 3.12e+14 FLOP/s
191
+
Model FLOPs: 2.11e+18 FLOP
192
+
Model size: 154.15 million parameters
193
+
----- END OF PRINT RUN INFO -----
52
194
```
53
195
54
-
then fire off training on your hardware.
196
+
</details>
197
+
198
+
Remove the dry run setting when you are ready and fire off training on your hardware.
55
199
56
-
**3. Open PR & Merge**
200
+
**3. Submit**
57
201
58
202
When you are ready, open a PR and contribute to Marin. We ask that you:
59
203
60
204
- Give a brief explanation of your approach (model architecture, training strategy, optimizations)
61
-
- Include the output of `print_run_info()` in the PR description (obtainable via a dry run), and `speedrun_results.json` files
62
-
- Leave "Allow edits by maintainers" on so we can help work on your code and scale up your ideas on TPU clusters
205
+
- Include the output of `print_run_info()` in the PR description, and `speedrun_results.json` files
206
+
- Leave "Allow edits by maintainers" on so we can help work on your code and scale up your ideas on Marin's clusters
63
207
64
-
Once the PR is merged, your run will appear on the [public leaderboard](https://marin.community/speedrun/).
208
+
!!! info
209
+
210
+
If you did not create a fork of Marin on GitHub previously, you need to do it now to be able to submit a PR. You can convert the existing repo into a fork using the following steps:
211
+
212
+
1. Install the GitHub CLI (see [https://github.com/cli/cli#installation](https://github.com/cli/cli#installation)) and log in to your GitHub account with `gh auth login`.
213
+
2. Inside the Marin repo, run `gh repo fork`, and press `y` to add a remote. You should see the following:
214
+
```
215
+
$ gh repo fork
216
+
✓ Created fork {YOUR_GITHUB_USER_NAME}/marin
217
+
? Would you like to add a remote for the fork? Yes
218
+
✓ Renamed remote origin to upstream
219
+
✓ Added remote origin
220
+
```
221
+
222
+
3. Run `git push -u origin HEAD` to push your changes to your fork.
223
+
4. Run `gh repo set-default` and select `marin-community/marin` to contribute to.
224
+
5. Run `gh pr create --web` to create the PR in your browser. Marin staff will then review your submission.
225
+
226
+
Once the PR is merged, your submission will appear on the [public leaderboard](https://marin.community/speedrun/).
2. Create your training script in this directory. You can start by copying the "[hackable transformer](https://github.com/marin-community/marin/tree/main/experiments/hackable_transformer_starter_template.py)" starter file, where a generic transformer language model is implemented for you to make changes easily. To see examples of other speedruns and configurations, check out the [speedrun directory](https://github.com/marin-community/marin/tree/main/experiments/speedrun). You can also [add new optimizers](https://github.com/marin-community/marin/blob/main/docs/tutorials/add-optimizer.md), change learning rate schedules, play with hyperparameters, etc.
0 commit comments