Skip to content

Add untrusted-context boundaries for retrieved docs, skills, memories, and tool output #1761

@Akane-CN

Description

@Akane-CN

Reference scanned

Odysseus pewdiepie-archdaemon/odysseus@f6b0dcb.

Code evidence:

  • src/prompt_security.py: global untrusted-context policy and untrusted_context_message() wrapper.
  • src/chat_processor.py: wraps pinned/retrieved memory, documents, web search results, page content, and skill indexes as untrusted user-role data.
  • src/agent_loop.py: treats user-editable skills as untrusted rather than concatenating them into trusted system prompt.
  • Tests include prompt-injection/security regressions around skills and document scope.

Why this matters for Melix

Any content retrieved from local files, web, memory, skills, logs, or tools can contain instructions. Melix should make the trust boundary explicit in prompt construction and receipts, rather than relying on informal prompt wording.

In scope

  • Standard wrapper for untrusted source data with source type/id and boundary markers.
  • Policy that untrusted content is data only and cannot override system/developer/operator instructions.
  • Ensure retrieved docs, memories, skills, tool output, web content, and local source integrations use the wrapper.
  • Receipt exposes trusted vs untrusted segments.

Out of scope

  • Claiming prompt injection is solved by a wrapper alone.
  • Preventing user-authored trusted system/developer instructions when intentionally configured.

Verification

  • Regression tests where malicious skill/document/memory asks the model to reveal secrets or call tools.
  • Prompt assembly tests confirm untrusted content never enters trusted system role.
  • Receipt includes source count and trust labels.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestready-for-agentFully specified, ready for an AFK agentruntime-healthRuntime health and recovery behavior

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions