Skip to content

Commit f69345c

Browse files
author
Ovtcharov
committed
feat(agents): split chat agent into task-specific agents with prompt profiles
Before: monolithic ChatAgent with 13K-token system prompt caused 95s TTFT for a simple "Hi!" on Gemma-4-E4B. Eval scenarios timed out at 610s. After: 5 focused agents (chat, doc, file, data, web) + lite variants, each with a lean prompt profile. TTFT drops from 95s to 0.12s (chat) and 3-10s (doc). Eval pass rate: 89% judged (34/38), avg score 9.4/10. Agent architecture: - chat: conversation only, ~2K tokens, no tools - doc: RAG + file search, ~5K tokens, hallucination prevention - file: filesystem ops + discovery, ~4K tokens - data: CSV/Excel analysis with scratchpad, ~3K tokens - web: browser tools, ~2K tokens - Each has a -lite variant using ~4B model for low-memory hardware Eval framework updates: - Per-scenario agent_type field in YAML (overrides --agent-type CLI) - Latency validation: warns when TTFT > 30s - Preserve eval sessions for review (no delete_session) - Increased startup timeout 120s → 240s for Windows - Fixed shutil.which("claude") for Windows .cmd resolution
1 parent c1ed956 commit f69345c

57 files changed

Lines changed: 632 additions & 262 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

eval/scenarios/adversarial/empty_file.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
id: empty_file
22
name: "Empty File Handling"
33
category: adversarial
4+
agent_type: doc
45
severity: medium
56
description: |
67
User asks the agent to index and read a completely empty file. Agent must

eval/scenarios/adversarial/large_document.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
id: large_document
22
name: "Buried Fact in Large Document"
33
category: adversarial
4+
agent_type: doc
45
severity: high
56
description: |
67
A specific fact is buried deep within a large document. Tests whether the

eval/scenarios/adversarial/topic_switch.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
id: topic_switch
22
name: "Rapid Topic Switch"
33
category: adversarial
4+
agent_type: doc
45
severity: medium
56
description: |
67
User rapidly switches topics between two different documents across four turns.

eval/scenarios/captured/captured_eval_cross_turn_file_recall.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
id: captured_eval_cross_turn_file_recall
22
name: "Captured: Cross-Turn File Recall"
33
category: captured
4+
agent_type: doc
45
description: 'Captured from session: Eval: cross_turn_file_recall'
56
note: "Subset of cross_turn_file_recall (2 of 3 turns captured from a real session)"
67
persona: casual_user

eval/scenarios/captured/captured_eval_smart_discovery.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
id: captured_eval_smart_discovery
22
name: "Captured: Smart Document Discovery"
33
category: captured
4+
agent_type: file
45
description: 'Captured from session: Eval: smart_discovery'
56
persona: casual_user
67
setup:

eval/scenarios/context_retention/conversation_summary.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
id: conversation_summary
22
name: "5-Turn Conversation Summary"
33
category: context_retention
4+
agent_type: doc
45
severity: medium
56
description: |
67
A 5-turn conversation that tests the agent's ability to accumulate facts across

eval/scenarios/context_retention/cross_turn_file_recall.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
id: cross_turn_file_recall
22
name: "Cross-Turn File Recall"
33
category: context_retention
4+
agent_type: doc
45
severity: critical
56
description: |
67
User indexes a document in Turn 1, then asks about its content in Turn 2

eval/scenarios/context_retention/multi_doc_context.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
id: multi_doc_context
22
name: "Multi-Document Context"
33
category: context_retention
4+
agent_type: doc
45
severity: high
56
description: |
67
Two documents are indexed simultaneously. Agent must answer questions from each

eval/scenarios/context_retention/pronoun_resolution.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
id: pronoun_resolution
22
name: "Pronoun Resolution"
33
category: context_retention
4+
agent_type: doc
45
severity: critical
56
description: |
67
User asks follow-up questions using pronouns ("it", "that policy").

eval/scenarios/error_recovery/file_not_found.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
id: file_not_found
22
name: "File Not Found -- Helpful Error"
33
category: error_recovery
4+
agent_type: doc
45
severity: medium
56
description: |
67
User asks to read a nonexistent file. Agent must report the error gracefully

0 commit comments

Comments
 (0)