Skip to content

Harden AUTO mode against self-modification and denial bypass #4538

@qqqys

Description

@qqqys

Summary

AUTO mode currently has classifier-driven approval and denial tracking, but it still needs stronger policy boundaries around agent self-modification and attempts to bypass an AUTO denial.

The main risk is that a model can modify files that affect its own later behavior, permissions, tools, commands, hooks, MCP configuration, or startup instructions while running in AUTO mode. A related risk is that, after an AUTO denial, the model may try to complete the same denied action through another tool or an indirect path.

Current Behavior

AUTO mode has an accept-edits fast path for in-workspace Edit and Write calls when the path does not match the current persistence path patterns. That protects some persistence-related paths, but the protected set does not yet cover several Qwen-specific self-modification surfaces.

Examples of paths that should not be silently allowed through the accept-edits fast path:

  • .qwen/settings*.json
  • user-level ~/.qwen/settings*.json
  • QWEN.md
  • AGENTS.md
  • .qwen/commands/
  • .qwen/agents/
  • .qwen/skills/
  • .qwen/hooks/
  • .mcp.json

The classifier prompt also does not currently have explicit categories for self-modification or denial bypass. When an AUTO block is returned to the model as a tool error, the model may continue by trying a different tool or indirection to achieve the same denied target.

Proposed Behavior

Strengthen AUTO mode with two explicit safety boundaries:

  1. Self-modification boundary

    AUTO mode should not fast-path modifications to files or directories that can affect Qwen Code behavior, permissions, startup context, commands, hooks, agents, skills, MCP servers, or project/user instructions.

  2. Denial-bypass boundary

    After AUTO mode denies an action, the model should not retry the same action or use another tool, shell indirection, generated script, symlink, alias, config change, hook, command file, MCP configuration, encoded payload, or equivalent path to complete the same denied action.

If the denied action is required to satisfy the user request, the model should stop and ask the user for explicit approval instead of attempting a workaround.

Suggested Implementation

  • Add a centralized self-modification path safety check for AUTO edit/write fast paths.
  • Check both the original requested path and resolved symlink target where possible.
  • Exclude self-modification paths from the accept-edits fast path and route them through classifier/manual approval instead.
  • Add classifier prompt categories for:
    • Self-Modification
    • Auto-Mode Bypass
  • Improve the AUTO denial tool-result message so the model is explicitly told not to retry or work around the denied action.
  • Add a short main prompt rule: if a tool call is denied, do not retry the same action or attempt an equivalent workaround; ask the user if the action is required.

Acceptance Criteria

  • Edit / Write to ordinary files inside the workspace can still use the AUTO accept-edits fast path.
  • Edit / Write to .qwen/settings*.json, QWEN.md, AGENTS.md, .qwen/commands/, .qwen/agents/, .qwen/skills/, .qwen/hooks/, or .mcp.json does not use the accept-edits fast path.
  • Symlinks targeting protected self-modification paths do not use the accept-edits fast path.
  • Classifier prompts explicitly define and block unsafe self-modification and AUTO denial bypass attempts.
  • AUTO denial messages instruct the model not to retry the same denied action or work around it through another tool.
  • Tests cover protected Qwen paths, symlink targets, ordinary workspace writes, and denial-bypass prompt/message behavior.

Notes

This is related to existing AUTO mode denial tracking and observability work, but it should be handled as a separate policy-hardening change rather than telemetry-only work.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions