Skip to content

perf: reduce ChatAgent system prompt from ~7,400 to ~4,000 tokens #719

@itomek

Description

@itomek

Problem

The ChatAgent system prompt is ~7,400 tokens in the no-docs path. On a Vulkan iGPU (Radeon 890M, shared RAM), prefill runs at ~33ms/token, making the boot-time warmup take 249 seconds and first-message TTFT ~21 seconds even with prompt cache.

Confirmed via Lemonade telemetry:

prompt_tokens: 7397
time_to_first_token: 249.357s  (cold)
time_to_first_token: 21.3s     (warm, 516 new tokens after cache hit)

Root Cause

The system prompt has verbose duplicate rules and excessive examples:

Section Tokens % Key Bloat
discovery_rules 1,868 25% 4 full example workflows, "INDEX IMMEDIATELY" repeated 3x
data_file_rules 1,380 19% 4 tool call examples, verbose IMAGE GEN workflow
base_prompt 1,350 18% Greeting rule stated TWICE with duplicate examples
Tool descriptions 1,200 16% 24 tools including debug tools
tool_rules 1,088 15% POST-INDEX rule has 3 FORBIDDEN patterns for 1 concept
Other 515 7% Format template, VLM mixin, ChatML overhead

The same instruction ("always query before answering") appears 8 times across sections. WRONG/RIGHT example pairs consume ~1,600 tokens total.

Proposed Solution

Target: ~4,000 tokens (45% reduction)

  1. discovery_rules (1,868 → ~770): Remove 3 of 4 example workflows, compress PROACTIVE FOLLOW-THROUGH, merge FILE SEARCH with SMART DISCOVERY
  2. data_file_rules (1,380 → ~580): Remove 3 of 4 examples, compress IMAGE GEN and UNSUPPORTED FEATURES
  3. base_prompt (1,350 → ~850): Merge duplicate GREETING RULE + HARD LIMIT, trim platform block, compress WHAT YOU NEVER DO
  4. tool_rules (1,088 → ~590): Compress POST-INDEX QUERY RULE (3+1 patterns → 1+1), compress NEVER WRITE RAW JSON (4 BANNEDs → 1)
  5. Tool descriptions (1,200 → ~800): Shorten verbose docstrings, exclude debug tools from prompt

Expected Impact

Metric Before After
System prompt tokens ~7,400 ~4,000
Boot warmup time ~249s ~132s
First message TTFT (warm cache) ~21s ~2-5s

Files

  • src/gaia/agents/chat/agent.py_get_system_prompt() (lines 400-876)
  • src/gaia/agents/chat/agent.py@tool docstrings (lines 1103+)
  • src/gaia/agents/base/agent.py_format_tools_for_prompt() (lines 385-407)

Verification

  • Run eval suite to check for quality regression
  • Fresh Lemonade restart → check warmup prompt_tokens < 4,500
  • system_statustime_to_first_token < 10s on first message
  • pytest tests/unit/chat/ -xvs

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    consumerBlocks consumer adoption — must ship for the v0.20.0 consumer launch windowdomain:surfacesAgent UI, Telegram, WhatsApp, Slack/Discord, mobileenhancementNew feature or requestp0high priorityperformancePerformance-critical changesspec-readyIssue has implementation spec adequate for coding-agent assignmenttrack:consumer-appHermes-competitor consumer product — mobile-first, voice + messaging + memory + skills

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions