-
-
Notifications
You must be signed in to change notification settings - Fork 931
Description
Hi,
I conducted a security audit of instructor focusing on attack surfaces when processing untrusted LLM outputs. Found two issues worth sharing:
1. Retry Amplification (Medium severity)
Location: instructor/core/retry.py, instructor/core/patch.py
The retry mechanism lacks rate limiting. An adversarial LLM (or prompt-injected response) can craft outputs that always fail validation, causing:
- Each retry appends 2 messages (assistant + tool error)
- Context grows exponentially
- 10 retries = 506x context growth in my PoC
Impact: Token budget exhaustion, cost amplification.
Suggested mitigation: Optional token budget limit in max_retries config, alongside existing timeout.
2. LLM Validator Injection (Medium severity)
Location: instructor/validation/llm_validators.py:50-66
User values are interpolated directly into validation prompts without escaping:
{"role": "user", "content": f"Does `{v}` follow the rules: {statement}"}If v contains prompt injection payloads, the validator LLM may be manipulated to return is_valid: true for invalid data.
Suggested mitigation: Explicit delimiters or structured input format for user values.
Happy to share the full audit document and PoC if helpful. No rush on response—I'm following a 30-day disclosure window before publishing findings publicly.
Let me know if you'd like more details on either finding.