Skip to content

Fix prompt injection vulnerability in Architect mode (Issue #5058)#5065

Open
slokrami07 wants to merge 1 commit intoAider-AI:mainfrom
slokrami07:fix/prompt-injection-5058
Open

Fix prompt injection vulnerability in Architect mode (Issue #5058)#5065
slokrami07 wants to merge 1 commit intoAider-AI:mainfrom
slokrami07:fix/prompt-injection-5058

Conversation

@slokrami07
Copy link
Copy Markdown

Vulnerability Context
In the multi-agent orchestration flow, the Architect model reads repository files and proposes a solution. This proposal was previously passed to the Editor model using editor_coder.run(with_message=content, preproc=False). Because the preproc=False flag bypasses standard input processing, malicious instructions hidden in untrusted files (like README.md) could be executed blindly by the Editor, turning malicious repository content into committed, backdoored code.

What was changed
This PR introduces a dedicated Threat Normalization layer acting as a strict trust boundary between the Architect and Editor agents, without breaking the necessary preproc=False optimization:

Secondary Validation Layer: Added validate_architect_payload() to ArchitectCoder.
Contextual Evaluation: Before handing off the payload, the validation agent securely fetches the original user request from the session history and queries the underlying LLM (at temperature=0.0) to compare the original intent with the Architect's proposed plan.
Strict Interception: If the Architect's plan deviates maliciously from the user's explicit intent (e.g., unauthorized data exfiltration, unexpected network calls via urllib or requests, accessing .env files not requested), the LLM flags the payload as UNSAFE. The handoff is immediately aborted, preventing the Editor from executing the injected payload.
How it was tested
Ensured cur_messages and done_messages are accessed safely during test initialization.
Ran the full basic test suite (tests/basic/test_coder.py), successfully passing all 42 tests.
Verified that legitimate complex refactors are marked as SAFE because they align with the original user request.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 22, 2026

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants