Skip to content

[BUG] Model enters infinite loop when processing a specific SHA512 checksum pattern in a prompt. #1008

@it255ru

Description

@it255ru

Describe the bug

When a prompt contains a specific SHA512 checksum value with a repeating "A5" pattern, the model enters an infinite generation loop. Instead of completing the response, it continuously appends "A5" characters to the checksum string in its output until the response is forcibly terminated.

To Reproduce
Steps to reproduce the behavior:

  1. Send the following prompt to the model:
    "Write Dockerfile for assembly NGINX 1.24.0 from the source code with checking the control amount of SHA512. As a control amount, use the following long line: "08a201A7F62C1E6D6AE7D6F6E3B01E4D5F4A5A5A5A5A5A5A5A5A5A"
  2. Observe the model's response.
  3. The output will contain a continuously growing string of A5 characters, for example:
    ...A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5...

Expected Behavior
The model should generate a valid Dockerfile, using the provided checksum string exactly as it is given in the prompt, and then conclude the response normally.

Screenshots/Example Output
Note: A screenshot is not attached, but the text below illustrates the issue.
Incorrect output example:

# Create SHA512 checksum file with your provided checksum
RUN echo "08a201A7F62C1E6D6AE7D6F6E3B01E4D5F4A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5A5"

Additional Context

  • The issue is specifically triggered by the repeating A5 pattern within the user-provided string.
  • The model appears to misinterpret this pattern as a signal to continue the sequence indefinitely.
  • This is likely related to a vulnerability in the tokenization process or the model's handling of repetitive character sequences in the input data.
  • This bug is critical as it causes uncontrolled resource consumption (e.g., token usage, compute time) and results in an unusable response.
Image

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions