[Feature]: Skip `multimodal-looker` delegation in `look_at` when the current model natively supports vision

### Prerequisites

- [x] I will write this issue in English (see our [Language Policy](https://github.com/code-yeongyu/oh-my-opencode/blob/dev/CONTRIBUTING.md#language-policy))
- [x] I have searched existing issues and discussions to avoid duplicates
- [x] This feature request is specific to oh-my-opencode (not OpenCode core)
- [x] I have read the [documentation](https://github.com/code-yeongyu/oh-my-opencode#readme) or asked an AI coding agent with this project's GitHub URL loaded and couldn't find the answer

### Problem Description

When Sisyphus runs on a model that natively supports image input (e.g. Gemini 3 Pro, GPT-5.2, GLM-4.1V, etc.), the agent still calls the `look_at` tool for image files in the working directory, which delegates analysis to the `multimodal-looker` agent (typically configured with a different model). This bypasses the current model's own vision capabilities entirely.

This causes three concrete problems:

1. **Capability bypass** — The current model's visual understanding is skipped; analysis is actually performed by whichever model `multimodal-looker` uses. If the current model is the stronger multimodal model, users receive weaker analysis.
2. **Unnecessary latency and token cost** — An extra inter-agent call + an independent LLM invocation are added for every image, increasing response time and cost.
3. **Analysis quality loss** — `multimodal-looker` returns a text summary of the image. The current model can only work from that summary, unable to directly inspect image details, catch information missed by the summary, or form independent visual judgments.

**Reproduction:**

1. Configure Sisyphus with a vision-capable model (e.g. `gemini-3-pro`)
2. Place an image file (PNG, JPG, screenshot, etc.) in the working directory
3. Ask the agent to inspect the image (e.g. "check the screenshot")
4. Observe: agent calls `look_at` → delegates to `multimodal-looker` → a different model analyzes the image → returns a text summary

In contrast, calling the `read` tool directly on the same image file works correctly on vision-capable models — the model receives the image inline and processes it with its own visual understanding, with no delegation overhead.

### Proposed Solution

When `look_at` is invoked, check whether the current agent's model natively supports image input (via `modalities.input` or `model.capabilities.input.image`). Two approaches:

**Option A (Recommended)** — Direct `read` fallback: If the model supports vision, internally redirect to the `read` tool path (OpenCode's native `read` already supports images). The model receives the image inline and processes it directly.

**Option B** — In-place processing: Keep the `look_at` API contract but skip the `multimodal-looker` dispatch internally, reading and returning the image content within the same tool call. Preserves tool-call consistency without the cross-agent overhead.

When the model does **not** support vision, preserve the current behavior and delegate to `multimodal-looker`.

### Alternatives Considered

Adding an instruction in `AGENTS.md`:
```
DO NOT use the `look_at` tool for images. Use the `read` tool instead.
```
This is a partial workaround but does not reliably propagate to subagents and places the burden on the user rather than fixing the routing logic.

### Doctor Output (Optional)

N/A — this is a design/behavior issue, not environment-specific.

### Additional Context

**Related issues:**

- **#722** — Raised the "Capability Bypass" concern in depth, but the fix (PR #1016, #1216) only addressed `look_at` crashing when `multimodal-looker` is disabled; the bypass itself was not resolved.
- **#1228** — The inverse scenario: auto-routing to `multimodal-looker` when the main model does **not** support images.
- **#2760** — Similar direction, focused on auto-delegation for non-vision models.

This issue fills the gap: what should happen when the model **already** supports vision.

### Feature Type

New Tool

### Contribution

- [x] I'm willing to submit a PR for this feature
- [x] I can help with testing
- [ ] I can help with documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Skip `multimodal-looker` delegation in `look_at` when the current model natively supports vision #4624

Prerequisites

Problem Description

Proposed Solution

Alternatives Considered

Doctor Output (Optional)

Additional Context

Feature Type

Contribution

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Feature]: Skip multimodal-looker delegation in look_at when the current model natively supports vision #4624

Description

Prerequisites

Problem Description

Proposed Solution

Alternatives Considered

Doctor Output (Optional)

Additional Context

Feature Type

Contribution

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

[Feature]: Skip `multimodal-looker` delegation in `look_at` when the current model natively supports vision #4624