[codex] fix data indexing and clip metrics by yifanzhang-pro · Pull Request #28 · complex-reasoning/RPG

yifanzhang-pro · 2026-04-29T00:34:00Z

Summary

This is a narrower replacement for the useful parts of #23.

Preserve the existing README fix on main that points installation at requirements.txt.
Update process-data.py to write 0-based repeat indices into the schema consumed by RLHFDataset: extra_info["index"].
Fix GRPO score aggregation to use torch.stack, preserving tensor dtype/device and computing the standard deviation over the response group.
Make REINFORCE hard-clamp clip fraction metrics report how often w falls outside the clamp bounds instead of always reporting zero.

Notes

The GRPO advantage change affects the training signal because it fixes the normalization used before policy loss computation.

The REINFORCE pg_clipfrac / pg_clipfrac_lower change only affects logged metrics. It does not change A, pg_losses, or pg_loss.

Validation

python -m py_compile process-data.py verl/trainer/ppo/core_algos.py
git diff --check
uvx ruff check process-data.py verl/trainer/ppo/core_algos.py

Not run: full torch/pandas tests in this temporary checkout because the local Python environment lacks project dependencies such as torch and pandas.

yifanzhang-pro · 2026-04-29T00:34:52Z

/gemini review

fix pr23 data and clip metrics

6cbe5f8

yifanzhang-pro marked this pull request as ready for review April 29, 2026 00:34

yifanzhang-pro merged commit c0574e4 into main Apr 29, 2026
1 of 18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] fix data indexing and clip metrics#28

[codex] fix data indexing and clip metrics#28
yifanzhang-pro merged 1 commit into
mainfrom
codex/pr23-safer-fixes

yifanzhang-pro commented Apr 29, 2026

Uh oh!

yifanzhang-pro commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yifanzhang-pro commented Apr 29, 2026

Summary

Notes

Validation

Uh oh!

yifanzhang-pro commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant