Skip to content

Security: Retry amplification and LLM validator injection findings #2056

@spacebrr

Description

@spacebrr

Hi,

I conducted a security audit of instructor focusing on attack surfaces when processing untrusted LLM outputs. Found two issues worth sharing:

1. Retry Amplification (Medium severity)

Location: instructor/core/retry.py, instructor/core/patch.py

The retry mechanism lacks rate limiting. An adversarial LLM (or prompt-injected response) can craft outputs that always fail validation, causing:

  • Each retry appends 2 messages (assistant + tool error)
  • Context grows exponentially
  • 10 retries = 506x context growth in my PoC

Impact: Token budget exhaustion, cost amplification.

Suggested mitigation: Optional token budget limit in max_retries config, alongside existing timeout.

2. LLM Validator Injection (Medium severity)

Location: instructor/validation/llm_validators.py:50-66

User values are interpolated directly into validation prompts without escaping:

{"role": "user", "content": f"Does `{v}` follow the rules: {statement}"}

If v contains prompt injection payloads, the validator LLM may be manipulated to return is_valid: true for invalid data.

Suggested mitigation: Explicit delimiters or structured input format for user values.


Happy to share the full audit document and PoC if helpful. No rush on response—I'm following a 30-day disclosure window before publishing findings publicly.

Let me know if you'd like more details on either finding.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedExtra attention is neededpriority:criticalCritical issue affecting production

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions