Skip to content
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
69120dd
Add OLMo-core based DPO training module
finbarrtimbers Jan 20, 2026
5595b4f
Cleaned up PR.
finbarrtimbers Jan 20, 2026
fb00977
Add OLMo-core train modules for DPO training
finbarrtimbers Jan 20, 2026
cef925d
Fix SpeedMonitorCallback parameter name
finbarrtimbers Jan 20, 2026
1757a2c
Fix CheckpointerCallback save_interval validation
finbarrtimbers Jan 20, 2026
9544586
Move checkpointing_steps default value to config class
finbarrtimbers Jan 20, 2026
e6e3a55
Remove duplicate checkpointing_steps field from ExperimentConfig
finbarrtimbers Jan 20, 2026
d5ac201
Add Saturn cluster to medium_dpo.sh script
finbarrtimbers Jan 20, 2026
e76b641
updated changelog
finbarrtimbers Jan 20, 2026
6de3e4a
Merge branch 'main' into finbarr/olmo-core-dpo-base
finbarrtimbers Jan 20, 2026
57442be
Remove explicit torchrun multi-node args from DPO scripts
finbarrtimbers Jan 20, 2026
72510aa
fixed linter errors
finbarrtimbers Jan 21, 2026
03ff5af
Merge branch 'main' into finbarr/olmo-core-dpo-base
finbarrtimbers Jan 21, 2026
2874682
Refactor DPO OLMo-core: add parallelism support, fix HSDP order
finbarrtimbers Jan 21, 2026
30b1f02
Fix race condition in reference logprobs cache directory creation
finbarrtimbers Jan 21, 2026
a630a98
Fix multi-node DPO post-training barrier failures
finbarrtimbers Jan 21, 2026
24c8aa8
Merge branch 'main' into finbarr/olmo-core-dpo-base
finbarrtimbers Jan 21, 2026
103e6ad
Remove redundant compute_loss_olmo wrapper function
finbarrtimbers Jan 21, 2026
e7eb57f
run urgent tests
finbarrtimbers Jan 21, 2026
13856e2
Fix case-insensitive beaker secret lookup
finbarrtimbers Jan 21, 2026
a8e2a16
Updated mason.py
finbarrtimbers Jan 21, 2026
0ed4fb9
Add uv run prefix to local DPO script
finbarrtimbers Jan 21, 2026
b1ae668
Save DPO models in HuggingFace format for evals
finbarrtimbers Jan 21, 2026
8fde3c2
Fix WEKA_CLUSTERS import in submit_eval_jobs.py
finbarrtimbers Jan 21, 2026
5ded08e
Update GRPO single GPU script to use DPO-trained model
finbarrtimbers Jan 21, 2026
fd5c338
Add --add_bos flag for OLMo model in GRPO script
finbarrtimbers Jan 21, 2026
13ce512
Copy original HF config when saving DPO model
finbarrtimbers Jan 22, 2026
2914d70
Use Weka path directly for DPO model in GRPO test
finbarrtimbers Jan 22, 2026
5413e53
Add logging for config.json save in DPO
finbarrtimbers Jan 22, 2026
8adbff3
Update GRPO script to use new DPO model path
finbarrtimbers Jan 22, 2026
e0d431b
Fix DPO HF model saving to use correct layer count
finbarrtimbers Jan 22, 2026
1e371f1
Fix OLMo-2-0425-1B config mapping to use correct layer count
finbarrtimbers Jan 22, 2026
4191e35
Fix HF model loading to use from_config instead of from_pretrained
finbarrtimbers Jan 22, 2026
b505884
Revert to using save_hf_model for DPO model saving
finbarrtimbers Jan 22, 2026
260bc1d
Update GRPO script to use DPO model with correct 16 layers
finbarrtimbers Jan 22, 2026
c4ac861
Copy original HF config after save_hf_model
finbarrtimbers Jan 22, 2026
af11bcf
Update GRPO script to use DPO model with complete config
finbarrtimbers Jan 22, 2026
1159aa6
Add OLMo3-7B DPO script using OLMo-core trainer
finbarrtimbers Jan 22, 2026
fba13a0
Add documentation for adding OLMo-core models
finbarrtimbers Jan 22, 2026
7041fc7
Add --no_auto_dataset_cache to DPO script
finbarrtimbers Jan 22, 2026
4ce0dbb
Fix multi-node torchrun configuration for DPO
finbarrtimbers Jan 22, 2026
3e889f4
Fix nnodes to use hardcoded value instead of BEAKER_NUM_REPLICAS
finbarrtimbers Jan 22, 2026
4e66469
Add torchrun multi-node parameters to debug DPO multi_node.sh
finbarrtimbers Jan 22, 2026
3e1fffa
Add OLMO_SHARED_FS=1 env var for multi-node DPO scripts
finbarrtimbers Jan 22, 2026
c8f53ae
Add comment about cache cleanup for corrupted dataset cache
finbarrtimbers Jan 22, 2026
e6418d8
Remove cache cleanup comment
finbarrtimbers Jan 22, 2026
5bfc032
Support separate model config and weights for OLMo-core DPO
finbarrtimbers Jan 22, 2026
9dbc783
Fix save_hf_model for FSDP-wrapped models in DPO
finbarrtimbers Jan 22, 2026
6514ef0
Fix DTensor to Tensor conversion in export_to_hf
finbarrtimbers Jan 22, 2026
d7be75d
Fix FSDP state_dict collective operation for multi-node export
finbarrtimbers Jan 22, 2026
91260b5
Add detailed logging to export_to_hf for debugging
finbarrtimbers Jan 22, 2026
24fbf27
Fix DTensor full_tensor() collective operation in export
finbarrtimbers Jan 22, 2026
38a42a5
Clean up debug logging in export_to_hf
finbarrtimbers Jan 22, 2026
c634ea0
Fix missing indices in DPO reference logprobs caching
finbarrtimbers Jan 23, 2026
b3b8c04
Add MFU/memory/token metrics to cache building + 3x cache batch size
finbarrtimbers Jan 23, 2026
c80a594
Add --cache_logprobs_only flag for DPO cache forward-pass benchmarking
finbarrtimbers Jan 23, 2026
a8753ba
Update DPO cache benchmark to match production OLMo3-7B config
finbarrtimbers Jan 23, 2026
5dd7ef0
Now, we avoid the torch warning
finbarrtimbers Jan 23, 2026
998e330
Merge branch 'main' into finbarr/olmo-core-dpo-base
finbarrtimbers Jan 23, 2026
c80ad94
6x cache batch size + mem% in DPO cache tqdm
finbarrtimbers Jan 23, 2026
e4bfc8f
Reduce cache batch multiplier to 4x (6x OOMed)
finbarrtimbers Jan 23, 2026
67b91c4
Try unsharded cache build, fall back to FSDP on OOM
finbarrtimbers Jan 23, 2026
dc9ba10
Fix data loader tests that used single_example_collator with batch_si…
finbarrtimbers Jan 23, 2026
a939ef2
Fix attn_backend auto-detection: check flash_attn_3 availability
finbarrtimbers Jan 23, 2026
bcb89d6
added export to HF function
finbarrtimbers Jan 24, 2026
d8492a6
Added script to convert olmo core to HF format.
finbarrtimbers Jan 26, 2026
aaac2f1
Add example usage to olmo-core to HF conversion script
finbarrtimbers Jan 26, 2026
a823bf8
Fix code review issues in convert_olmo_core_to_hf.py
finbarrtimbers Jan 26, 2026
b4969b4
Merge branch 'main' into finbarr/olmo-core-dpo-base
finbarrtimbers Jan 26, 2026
cefeb01
Merge branch 'main' into finbarr/olmo-core-dpo-base
finbarrtimbers Jan 26, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ All notable changes to this project will be documented in this file.


### Added
- Added OLMo-core based DPO training script (https://github.com/allenai/open-instruct/pull/1391).
- Added the ability to set active tools on a per-sample basis. See the PR for more details: https://github.com/allenai/open-instruct/pull/1382
- Added a new changelog Github Action that makes sure you contribute to the changelog! https://github.com/allenai/open-instruct/pull/1276
- Now, we type check `open_instruct/dataset_transformation.py` (https://github.com/allenai/open-instruct/pull/1390).
Expand Down
1 change: 1 addition & 0 deletions mason.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
OPEN_INSTRUCT_COMMANDS = [
"open_instruct/finetune.py",
"open_instruct/dpo_tune_cache.py",
"open_instruct/dpo.py",
"open_instruct/grpo_fast.py",
"open_instruct/ppo.py",
"open_instruct/reward_modeling.py",
Expand Down
Loading
Loading