Skip to content

Wrong repetition penalty imported #276

Open
@casper-hansen

Description

@casper-hansen

Hi maintainers of open-r1. I just wanted to make a comment on the latest repetition penalty after talking with one of the authors of the Demystifying long CoT paper.

I just noticed that you imported the wrong repetition penalty from the Demystifying Long CoT code. The repetition penalty that was actually used in the paper was the class RepetitionDensePenalty.

This may be a problem because the difference in implementation is global vs token-level penalty.

I had an LLM generate a feature difference:

Feature RepetitionDensePenalty get_repetition_penalty
Granularity Token-level penalties Global sequence penalty
Penalty Type Fixed value per token Scaled based on repetition
Implementation Modifies token rewards directly Computes a single penalty score
Use Case Reinforcement learning (RL) reward models Global repetition control in scoring
Effect on Rewards Directly affects token-level reward tensor Adjusts overall sequence score

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions