File tree Expand file tree Collapse file tree 1 file changed +3
-11
lines changed
Expand file tree Collapse file tree 1 file changed +3
-11
lines changed Original file line number Diff line number Diff line change @@ -114,26 +114,18 @@ Achieves 78.6% on CUTE (vs 56.9% for Olmo 3) and 71.6% on EXECUTE benchmarks thr
114114Unlike subword models, Bolmo can arbitrarily adjust the bytes-per-patch ratio to trade off speed for performance:
115115
116116``` python
117- # Train with higher compression for faster inference
118- torchrun -- nproc- per- node= 8 src/ examples/ bolmo/ train_stage2.py \
119- -- target- compression= 8.0 # vs default ~4.4
117+ TODO
120118```
121119
122120### 4. Zero-Cost Post-Training
123121Existing post-trained checkpoints can be byteified without additional training using Task Arithmetic:
124122
125123``` python
126- from olmo_core.nn.bolmo import byteify_checkpoint
127-
128- # Merge post-trained checkpoint into Bolmo
129- byteified_model = byteify_checkpoint(
130- bolmo_base = " allenai/Bolmo-7B" ,
131- posttrain_checkpoint = " allenai/OLMo-3-7B-Instruct"
132- )
124+ TODO
133125```
134126
135127### 5. Efficient Training
136- Total training cost: only 39.3B tokens (≈173B bytes) to byteify an existing model - orders of magnitude less than training from scratch .
128+ Total training cost: 9.8B tokens (≈43B bytes) for Stage 1, 39.3B tokens (≈173B bytes) for Stage 2 to byteify an existing model .
137129
138130## Performance
139131
You can’t perform that action at this time.
0 commit comments