Documenting some issue we need to solve before GRPO works well with Olmo-core, following up on #1543
save_hf_model doesn't work with Qwen
- multi-gpu run seems to get stuck on
(DataPreparationActor pid=127051) 2026-03-22 20:33:10 - INFO - data_loader.py:1203 - [DataPreparationActor] Step 2: waiting for step 1 to be consumed. whereas single-gpu run seems fine
- add ability to set attention backend like
dpo.py, some logging from Olmo-core seems to be suggesting using the torch backend
- packing?
things to double-check
- make sure we're correctly normalizing the loss
Documenting some issue we need to solve before GRPO works well with Olmo-core, following up on #1543
save_hf_modeldoesn't work with Qwen(DataPreparationActor pid=127051) 2026-03-22 20:33:10 - INFO - data_loader.py:1203 - [DataPreparationActor] Step 2: waiting for step 1 to be consumed.whereas single-gpu run seems finedpo.py, some logging from Olmo-core seems to be suggesting using thetorchbackendthings to double-check