Current issues with Olmo-core GRPO

Documenting some issue we need to solve before GRPO works well with Olmo-core, following up on #1543 

- `save_hf_model` doesn't work with Qwen
- multi-gpu run seems to get stuck on `(DataPreparationActor pid=127051) 2026-03-22 20:33:10 - INFO - data_loader.py:1203 - [DataPreparationActor] Step 2: waiting for step 1 to be consumed.` whereas single-gpu run seems fine
- add ability to set attention backend like `dpo.py`, some logging from Olmo-core seems to be suggesting using the `torch` backend
- packing?

things to double-check
- make sure we're correctly normalizing the loss 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Current issues with Olmo-core GRPO #1550

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Current issues with Olmo-core GRPO #1550

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions