Skip to content

fix: add repetition penalty to mitigate multi-turn repetition (fixes #1125)#1129

Open
modimihir07 wants to merge 1 commit intodeepseek-ai:mainfrom
modimihir07:fix/repetition-penalty-multiturn-1125
Open

fix: add repetition penalty to mitigate multi-turn repetition (fixes #1125)#1129
modimihir07 wants to merge 1 commit intodeepseek-ai:mainfrom
modimihir07:fix/repetition-penalty-multiturn-1125

Conversation

@modimihir07
Copy link
Copy Markdown

Problem

Fixes #1125

The sample() function in generate.py only supports temperature scaling. There is no mechanism to penalize tokens that have already been generated, which causes a self-reinforcing repetition loop in multi-turn dialogues:

  1. Model generates a pattern (e.g., "Based on the reasoning above...")
  2. Pattern enters conversation history
  3. Model sees the pattern in context and is more likely to repeat it
  4. Loop escalates with each turn

Fix

Added repetition_penalty parameter using the approach from the CTRL paper (Keskar et al., 2019) — the same method used by HuggingFace Transformers and vLLM:

  • For each previously generated token, scale down its logit:
    • If logit > 0: divide by penalty factor
    • If logit < 0: multiply by penalty factor

Changes:

  • sample() — new repetition_penalty and generated_tokens parameters
  • generate() — passes token history to sample() for penalty
  • main() — threads repetition_penalty through
  • CLI — new --repetition-penalty flag (default: 1.0 = backward compatible)

Usage

# Default (no change from before)
torchrun ... generate.py --ckpt-path ... --config ... --interactive

# With repetition penalty (recommended 1.1–1.3 for multi-turn)
torchrun ... generate.py --ckpt-path ... --config ... --interactive --repetition-penalty 1.2

Copy link
Copy Markdown

@ai-nurmamat ai-nurmamat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great fix for the repetition issue! Consider adding a configurable repetition penalty range in the inference config.

@modimihir07
Copy link
Copy Markdown
Author

modimihir07 commented Mar 11, 2026 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] New version's reasoning output suffix constraints degrade model performance in multi-turn dialogues, causing repetitive responses and stagnation

2 participants