perf: reduce ChatAgent system prompt from ~7,400 to ~4,000 tokens

## Problem

The ChatAgent system prompt is ~7,400 tokens in the no-docs path. On a Vulkan iGPU (Radeon 890M, shared RAM), prefill runs at ~33ms/token, making the boot-time warmup take **249 seconds** and first-message TTFT **~21 seconds** even with prompt cache.

Confirmed via Lemonade telemetry:
```
prompt_tokens: 7397
time_to_first_token: 249.357s  (cold)
time_to_first_token: 21.3s     (warm, 516 new tokens after cache hit)
```

## Root Cause

The system prompt has verbose duplicate rules and excessive examples:

| Section | Tokens | % | Key Bloat |
|---------|--------|---|-----------|
| `discovery_rules` | 1,868 | 25% | 4 full example workflows, "INDEX IMMEDIATELY" repeated 3x |
| `data_file_rules` | 1,380 | 19% | 4 tool call examples, verbose IMAGE GEN workflow |
| `base_prompt` | 1,350 | 18% | Greeting rule stated TWICE with duplicate examples |
| Tool descriptions | 1,200 | 16% | 24 tools including debug tools |
| `tool_rules` | 1,088 | 15% | POST-INDEX rule has 3 FORBIDDEN patterns for 1 concept |
| Other | 515 | 7% | Format template, VLM mixin, ChatML overhead |

The same instruction ("always query before answering") appears **8 times** across sections. WRONG/RIGHT example pairs consume ~1,600 tokens total.

## Proposed Solution

Target: **~4,000 tokens** (45% reduction)

1. **`discovery_rules`** (1,868 → ~770): Remove 3 of 4 example workflows, compress PROACTIVE FOLLOW-THROUGH, merge FILE SEARCH with SMART DISCOVERY
2. **`data_file_rules`** (1,380 → ~580): Remove 3 of 4 examples, compress IMAGE GEN and UNSUPPORTED FEATURES
3. **`base_prompt`** (1,350 → ~850): Merge duplicate GREETING RULE + HARD LIMIT, trim platform block, compress WHAT YOU NEVER DO
4. **`tool_rules`** (1,088 → ~590): Compress POST-INDEX QUERY RULE (3+1 patterns → 1+1), compress NEVER WRITE RAW JSON (4 BANNEDs → 1)
5. **Tool descriptions** (1,200 → ~800): Shorten verbose docstrings, exclude debug tools from prompt

## Expected Impact

| Metric | Before | After |
|--------|--------|-------|
| System prompt tokens | ~7,400 | ~4,000 |
| Boot warmup time | ~249s | ~132s |
| First message TTFT (warm cache) | ~21s | ~2-5s |

## Files

- `src/gaia/agents/chat/agent.py` — `_get_system_prompt()` (lines 400-876)
- `src/gaia/agents/chat/agent.py` — `@tool` docstrings (lines 1103+)
- `src/gaia/agents/base/agent.py` — `_format_tools_for_prompt()` (lines 385-407)

## Verification

- Run eval suite to check for quality regression
- Fresh Lemonade restart → check warmup `prompt_tokens` < 4,500
- `system_status` → `time_to_first_token` < 10s on first message
- `pytest tests/unit/chat/ -xvs`

## Related

- #713 — Boot-time initialization (warmup inference feature)
- Commit `8153e99` — Warmup prompt cache at boot (already merged to feat branch)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: reduce ChatAgent system prompt from ~7,400 to ~4,000 tokens #719

Problem

Root Cause

Proposed Solution

Expected Impact

Files

Verification

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Section	Tokens	%	Key Bloat
`discovery_rules`	1,868	25%	4 full example workflows, "INDEX IMMEDIATELY" repeated 3x
`data_file_rules`	1,380	19%	4 tool call examples, verbose IMAGE GEN workflow
`base_prompt`	1,350	18%	Greeting rule stated TWICE with duplicate examples
Tool descriptions	1,200	16%	24 tools including debug tools
`tool_rules`	1,088	15%	POST-INDEX rule has 3 FORBIDDEN patterns for 1 concept
Other	515	7%	Format template, VLM mixin, ChatML overhead

Metric	Before	After
System prompt tokens	~7,400	~4,000
Boot warmup time	~249s	~132s
First message TTFT (warm cache)	~21s	~2-5s

perf: reduce ChatAgent system prompt from ~7,400 to ~4,000 tokens #719

Description

Problem

Root Cause

Proposed Solution

Expected Impact

Files

Verification

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions