Wrong repetition penalty imported

Hi maintainers of open-r1. I just wanted to make a comment on the latest repetition penalty after talking with one of the authors of the Demystifying long CoT paper.

I just noticed that you imported the [wrong repetition penalty](https://github.com/eddycmu/demystify-long-cot/blob/release/openrlhf/openrlhf/reward/repetition.py#L56-L70) from the Demystifying Long CoT code. The repetition penalty that was actually used in the paper was the class [RepetitionDensePenalty](https://github.com/eddycmu/demystify-long-cot/blob/release/openrlhf/openrlhf/reward/repetition.py#L10-L53).

This may be a problem because the difference in implementation is global vs token-level penalty.

I had an LLM generate a feature difference:

| Feature                   | `RepetitionDensePenalty`  | `get_repetition_penalty` |
|---------------------------|-------------------------|-------------------------|
| **Granularity**           | Token-level penalties   | Global sequence penalty |
| **Penalty Type**          | Fixed value per token   | Scaled based on repetition |
| **Implementation**        | Modifies token rewards directly | Computes a single penalty score |
| **Use Case**              | Reinforcement learning (RL) reward models | Global repetition control in scoring |
| **Effect on Rewards**     | Directly affects token-level reward tensor | Adjusts overall sequence score |



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wrong repetition penalty imported #276

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature	`RepetitionDensePenalty`	`get_repetition_penalty`
Granularity	Token-level penalties	Global sequence penalty
Penalty Type	Fixed value per token	Scaled based on repetition
Implementation	Modifies token rewards directly	Computes a single penalty score
Use Case	Reinforcement learning (RL) reward models	Global repetition control in scoring
Effect on Rewards	Directly affects token-level reward tensor	Adjusts overall sequence score

Wrong repetition penalty imported #276

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions