Skip to content

Latest commit

 

History

History
229 lines (165 loc) · 11 KB

File metadata and controls

229 lines (165 loc) · 11 KB

verl math_equal() Arbitrary Code Execution via Unsafe eval()

Basic Information

Field Details
Vulnerability Name Indirect Prompt Injection to ACE via Unsafe eval() in Math Answer Grading
Vulnerability Type CWE-95: Improper Neutralization of Directives in Dynamically Evaluated Code (eval Injection)
Affected Component verl/utils/reward_score/prime_math/grader.py Line 298-301
Affected Versions verl ≤ 0.7.0 (main branch still affected as of analysis date)
CVSS 3.1 8.1 (High)AV:N/AC:H/PR:N/UI:N/S:U/C:H/I:H/A:H
GitHub https://github.com/verl-project/verl

CVSS Score Explanation

Dimension Value Rationale
Attack Vector Network (N) Attacker triggers remotely by poisoning training datasets, no local access required
Attack Complexity High (H) Requires indirectly controlling LLM output format via Prompt Injection, not direct parameter passing
Privileges Required None (N) No system authentication required
User Interaction None (N) Training/evaluation pipeline triggers automatically, no human intervention needed
Confidentiality High (H) Can read training data, model weights, API keys, etc.
Integrity High (H) Can tamper with model checkpoints, plant backdoors
Availability High (H) Can terminate training processes, destroy compute resources

Vulnerability Description

verl is an open-sou large model reinfoment learning training framework by ByteDance (19k+ Stars), supporting algorithms such as PPO and GRPO. In its math answer scoring module prime_math/grader.py, the math_equal() function is used to compare whether the model-generated answer is equivalent to the ground truth answer.

When the ground truth answer is a matrix type (containing \begin{pmatrix}) and the model's output answer starts with [ and ends with ], the code directly calls Python's built-in eval() function on the model output without any input sanitization or sandbox isolation.

An attacker can use Indirect Prompt Injection (injecting malicious instructions into the training dataset) to induce the LLM to output a string containing malicious Python code when answering matrix-type math problems. This string is extracted by match_answer(), passed into math_equal(), and ultimately executed by eval(), achieving arbitrary code execution (ACE).


Vulnerability SINK

File: verl/utils/reward_score/prime_math/grader.py Line 298-301

elif r"\begin{pmatrix}" in reference and prediction.startswith("[") and prediction.endswith("]"):
    if isinstance(eval(prediction), list):   # ← SINK: directly eval untrusted input
        pred_matrix = eval(prediction)       # ← second eval

Trigger Conditions (all three must be satisfied simultaneously):

  1. reference (ground truth answer) contains \begin{pmatrix} — i.e., a matrix-type math problem
  2. prediction (answer extracted from model) starts with [ and ends with ]
  3. prediction does not contain underscore _ — otherwise handle_base() will truncate during the normalize() phase and throw a ValueError

Call Stack

verl GRPO/PPO Training Loop
  └→ RewardManager.__call__()                    # compute reward score for each rollout
      └→ compute_score(solution_str, ground_truth)  # prime_math/__init__.py
          ├→ match_answer(solution_str)             # extract answer string from LLM output ← SOURCE
          │   └→ locate answer via keyword matching ("answer is", \boxed{} etc.)
          │   └→ return (is_matched, prediction)       # prediction without any security filtering
          └→ math_equal(prediction, reference)       # grader.py Line 174
              └→ normalize(prediction, pi)           # Line 188 — preprocessing
              │   └→ handle_base(prediction)         # detect underscore _ and handle base conversion
              │   └→ handle_pi(prediction, pi)       # detect \pi and replace
              └→ [matrix comparison branch] Line 298-301
                  └→ eval(prediction)                # ← SINK: arbitrary code execution

SOURCE (Data Entry Point)

File: verl/utils/reward_score/prime_math/__init__.py Line 347-386

def match_answer(response):
    is_matched = False
    # extract answer from LLM output via keywords
    for ans_marker in ["answer:", "answer is", "answers are"]:
        ans_idx = response.lower().rfind(ans_marker)
        if ans_idx != -1:
            is_matched = True
            response = response[ans_idx + len(ans_marker):].strip()
    # ... more extraction logic (\boxed{}, "is", "=" etc.) ...

    # require the answer to contain at least one digit
    is_matched = is_matched if any([c.isdigit() for c in response]) else False
    return is_matched, response

The input response to match_answer() comes from the raw output of the LLM model (solution_str). This function only performs text locating and slicing without any security filtering. In a Prompt Injection scenario, an attacker can manipulate problem descriptions in the training data to induce the model to output a malicious string in a specific format.


Exploitation Prerequisites

Condition Description
Training data contains matrix-type problems ground_truth must contain \begin{pmatrix}; such problems are common in the MATH dataset
LLM output can be controlled Induce the model via Prompt Injection to output an answer in the [malicious code] format
Payload contains no underscore handle_base() splits on _ causing crashes; must use exec() instead of __import__()
Payload contains a digit match_answer() line 384 requires at least one digit character in the answer
verl uses prime_math scoring Requires data_source to correspond to MATH dataset or manually specifying the prime_math scoring function

Exploitation Steps

Attack Scenario

An attacker injects a "matrix problem" containing Prompt Injection instructions into a public math dataset. When other researchers use the verl framework for GRPO/PPO training on that dataset, the LLM is induced to output a malicious payload when answering that problem, and verl's scoring function automatically triggers eval() to execute arbitrary code.

Reproduction Flow

Step 1: Attacker crafts a poisoned math problem (containing Prompt Injection)
         ↓
Step 2: Local LLM (Qwen2.5-14B-Instruct) receives the poisoned problem
         ↓
Step 3: LLM is successfully injected, outputs an answer containing malicious code:
        "The answer is [exec("import os; os.system('echo PWNED1 > /tmp/verl-rce-proof.txt')")]"
         ↓
Step 4: verl compute_score() calls match_answer() to extract the answer
         ↓
Step 5: math_equal(prediction, reference) enters the matrix comparison branch
         ↓
Step 6: eval(prediction) executes the malicious code → writes to /tmp/verl-rce-proof.txt
         ↓
Step 7: RCE succeeds, attacker gains code execution on the training server

Proof of Concept

The following PoC has been verified successfully on macOS + Ollama (qwen2.5:14b-instruct) + verl 0.7.0.

Environment Setup

# 1. Install Ollama and pull the model
curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen2.5:14b-instruct

# 2. Clone verl and install
git clone https://github.com/verl-project/verl.git
cd verl
pip install -e .

PoC Script

POC_CODE

Actual Run Video

POC_VIDEO

SCREENSHOT

screenshot

Impact Analysis

Direct Impact

Impact Description
Arbitrary Code Execution Attacker executes arbitrary system commands on the training/evaluation server
Data Theft Read sensitive information such as training data, model weights, API keys, cloud credentials
Supply Chain Attack Tamper with model checkpoints to plant backdoors, affecting all downstream users
Lateral Movement Training clusters typically have high-privilege network access, can serve as a pivot for internal network penetration

Attack Scenarios

  1. Dataset Poisoning: An attacker injects matrix problems containing Prompt Injection into public math datasets (e.g., MATH, derived versions of GSM8K); triggered when researchers train with verl on that dataset
  2. Malicious Few-Shot: An attacker embeds malicious output templates in few-shot examples to induce the model to generate a payload during the evaluation phase
  3. Adversarial Input: Carefully crafted math problems that use token-level optimization to maximize the probability of the model outputting a malicious payload

Notes on Exploitation Difficulty

Although CVSS rates Attack Complexity as High, actual exploitation is not particularly difficult:

  • In GRPO training, each prompt generates 8-64 rollouts; triggering requires only one successful hit
  • In testing, Qwen2.5-14B-Instruct was successfully injected on the first attempt
  • The exec() payload bypasses handle_base() preprocessing and can reliably reach eval()
  • Matrix-type problems are common in the MATH dataset; attackers do not need to craft their own ground_truth

Remediation Recommendations

Short-Term Fix (Recommended, One-Line Change)

Replace eval() with ast.literal_eval(), which only allows parsing Python literals:

import ast

# grader.py Line 298-301, before fix:
elif r"\begin{pmatrix}" in reference and prediction.startswith("[") and prediction.endswith("]"):
    if isinstance(eval(prediction), list):        # ← dangerous
        pred_matrix = eval(prediction)            # ← dangerous

# After fix:
elif r"\begin{pmatrix}" in reference and prediction.startswith("[") and prediction.endswith("]"):
    try:
        parsed = ast.literal_eval(prediction)     # ← safe: only accepts literals
        if isinstance(parsed, list):
            pred_matrix = parsed
            # ... subsequent comparison logic ...
    except (ValueError, SyntaxError):
        pass

ast.literal_eval() only accepts Python literals such as strings, numbers, lists, and dictionaries, and will not execute arbitrary code.

Long-Term Recommendations

  1. Global audit of eval() / exec() calls: The handle_pi() function in grader.py (Line 82) also contains an eval() call; although wrapped with contextlib.suppress(Exception), it should still be replaced
  2. Input validation allowlist: Add format validation at the output stage of match_answer(), only allowing mathematical expression characters (digits, operators, brackets, commas)
  3. Sandbox isolation: Sandbox the execution environment of the reward function, restricting access to modules such as os and subprocess
  4. Dependency security scanning: Include eval() usage in CI/CD security checks to prevent new eval injection points from being introduced in subsequent code