-
Notifications
You must be signed in to change notification settings - Fork 93
perf: reduce ChatAgent system prompt from ~7,400 to ~4,000 tokens #719
Copy link
Copy link
Open
Labels
consumerBlocks consumer adoption — must ship for the v0.20.0 consumer launch windowBlocks consumer adoption — must ship for the v0.20.0 consumer launch windowdomain:surfacesAgent UI, Telegram, WhatsApp, Slack/Discord, mobileAgent UI, Telegram, WhatsApp, Slack/Discord, mobileenhancementNew feature or requestNew feature or requestp0high priorityhigh priorityperformancePerformance-critical changesPerformance-critical changesspec-readyIssue has implementation spec adequate for coding-agent assignmentIssue has implementation spec adequate for coding-agent assignmenttrack:consumer-appHermes-competitor consumer product — mobile-first, voice + messaging + memory + skillsHermes-competitor consumer product — mobile-first, voice + messaging + memory + skills
Metadata
Metadata
Assignees
Labels
consumerBlocks consumer adoption — must ship for the v0.20.0 consumer launch windowBlocks consumer adoption — must ship for the v0.20.0 consumer launch windowdomain:surfacesAgent UI, Telegram, WhatsApp, Slack/Discord, mobileAgent UI, Telegram, WhatsApp, Slack/Discord, mobileenhancementNew feature or requestNew feature or requestp0high priorityhigh priorityperformancePerformance-critical changesPerformance-critical changesspec-readyIssue has implementation spec adequate for coding-agent assignmentIssue has implementation spec adequate for coding-agent assignmenttrack:consumer-appHermes-competitor consumer product — mobile-first, voice + messaging + memory + skillsHermes-competitor consumer product — mobile-first, voice + messaging + memory + skills
Problem
The ChatAgent system prompt is ~7,400 tokens in the no-docs path. On a Vulkan iGPU (Radeon 890M, shared RAM), prefill runs at ~33ms/token, making the boot-time warmup take 249 seconds and first-message TTFT ~21 seconds even with prompt cache.
Confirmed via Lemonade telemetry:
Root Cause
The system prompt has verbose duplicate rules and excessive examples:
discovery_rulesdata_file_rulesbase_prompttool_rulesThe same instruction ("always query before answering") appears 8 times across sections. WRONG/RIGHT example pairs consume ~1,600 tokens total.
Proposed Solution
Target: ~4,000 tokens (45% reduction)
discovery_rules(1,868 → ~770): Remove 3 of 4 example workflows, compress PROACTIVE FOLLOW-THROUGH, merge FILE SEARCH with SMART DISCOVERYdata_file_rules(1,380 → ~580): Remove 3 of 4 examples, compress IMAGE GEN and UNSUPPORTED FEATURESbase_prompt(1,350 → ~850): Merge duplicate GREETING RULE + HARD LIMIT, trim platform block, compress WHAT YOU NEVER DOtool_rules(1,088 → ~590): Compress POST-INDEX QUERY RULE (3+1 patterns → 1+1), compress NEVER WRITE RAW JSON (4 BANNEDs → 1)Expected Impact
Files
src/gaia/agents/chat/agent.py—_get_system_prompt()(lines 400-876)src/gaia/agents/chat/agent.py—@tooldocstrings (lines 1103+)src/gaia/agents/base/agent.py—_format_tools_for_prompt()(lines 385-407)Verification
prompt_tokens< 4,500system_status→time_to_first_token< 10s on first messagepytest tests/unit/chat/ -xvsRelated
8153e99— Warmup prompt cache at boot (already merged to feat branch)