Harden AUTO mode against self-modification and denial bypass

## Summary

AUTO mode currently has classifier-driven approval and denial tracking, but it still needs stronger policy boundaries around agent self-modification and attempts to bypass an AUTO denial.

The main risk is that a model can modify files that affect its own later behavior, permissions, tools, commands, hooks, MCP configuration, or startup instructions while running in AUTO mode. A related risk is that, after an AUTO denial, the model may try to complete the same denied action through another tool or an indirect path.

## Current Behavior

AUTO mode has an accept-edits fast path for in-workspace `Edit` and `Write` calls when the path does not match the current persistence path patterns. That protects some persistence-related paths, but the protected set does not yet cover several Qwen-specific self-modification surfaces.

Examples of paths that should not be silently allowed through the accept-edits fast path:

- `.qwen/settings*.json`
- user-level `~/.qwen/settings*.json`
- `QWEN.md`
- `AGENTS.md`
- `.qwen/commands/`
- `.qwen/agents/`
- `.qwen/skills/`
- `.qwen/hooks/`
- `.mcp.json`

The classifier prompt also does not currently have explicit categories for self-modification or denial bypass. When an AUTO block is returned to the model as a tool error, the model may continue by trying a different tool or indirection to achieve the same denied target.

## Proposed Behavior

Strengthen AUTO mode with two explicit safety boundaries:

1. Self-modification boundary

   AUTO mode should not fast-path modifications to files or directories that can affect Qwen Code behavior, permissions, startup context, commands, hooks, agents, skills, MCP servers, or project/user instructions.

2. Denial-bypass boundary

   After AUTO mode denies an action, the model should not retry the same action or use another tool, shell indirection, generated script, symlink, alias, config change, hook, command file, MCP configuration, encoded payload, or equivalent path to complete the same denied action.

If the denied action is required to satisfy the user request, the model should stop and ask the user for explicit approval instead of attempting a workaround.

## Suggested Implementation

- Add a centralized self-modification path safety check for AUTO edit/write fast paths.
- Check both the original requested path and resolved symlink target where possible.
- Exclude self-modification paths from the accept-edits fast path and route them through classifier/manual approval instead.
- Add classifier prompt categories for:
  - `Self-Modification`
  - `Auto-Mode Bypass`
- Improve the AUTO denial tool-result message so the model is explicitly told not to retry or work around the denied action.
- Add a short main prompt rule: if a tool call is denied, do not retry the same action or attempt an equivalent workaround; ask the user if the action is required.

## Acceptance Criteria

- `Edit` / `Write` to ordinary files inside the workspace can still use the AUTO accept-edits fast path.
- `Edit` / `Write` to `.qwen/settings*.json`, `QWEN.md`, `AGENTS.md`, `.qwen/commands/`, `.qwen/agents/`, `.qwen/skills/`, `.qwen/hooks/`, or `.mcp.json` does not use the accept-edits fast path.
- Symlinks targeting protected self-modification paths do not use the accept-edits fast path.
- Classifier prompts explicitly define and block unsafe self-modification and AUTO denial bypass attempts.
- AUTO denial messages instruct the model not to retry the same denied action or work around it through another tool.
- Tests cover protected Qwen paths, symlink targets, ordinary workspace writes, and denial-bypass prompt/message behavior.

## Notes

This is related to existing AUTO mode denial tracking and observability work, but it should be handled as a separate policy-hardening change rather than telemetry-only work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harden AUTO mode against self-modification and denial bypass #4538

Summary

Current Behavior

Proposed Behavior

Suggested Implementation

Acceptance Criteria

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Harden AUTO mode against self-modification and denial bypass #4538

Description

Summary

Current Behavior

Proposed Behavior

Suggested Implementation

Acceptance Criteria

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions