-
Notifications
You must be signed in to change notification settings - Fork 3.6k
07.3 History Management
Relevant source files
The following files were used as context for generating this wiki page:
Purpose: This document describes ZeroClaw's conversation history management system, which tracks message exchanges between users and the LLM, prevents unbounded memory growth through intelligent compaction, and enriches context with relevant memories. For information about long-term memory storage (recall/store/forget), see Memory System. For details about the agent turn loop that consumes history, see Agent Turn Cycle.
ZeroClaw maintains a per-session conversation history (Vec<ChatMessage>) that grows with each user message, assistant response, and tool execution. Without management, this history would eventually exceed model context limits and exhaust memory. The history management system provides:
- Automatic compaction using LLM-based summarization when thresholds are exceeded
- Hard trimming as a safety cap to prevent runaway growth
- Context enrichment by injecting relevant memories and hardware documentation before LLM calls
- Credential scrubbing to prevent accidental leakage of secrets in tool outputs
- Format adaptation for both native tool calling APIs and XML-guided prompt-based tool use
Sources: src/agent/loop_.rs:1-1585
The core unit of conversation history is ChatMessage from the providers module, which represents a single turn in the conversation:
ChatMessage {
role: String, // "system", "user", "assistant", "tool"
content: String, // Message text or JSON-encoded tool data
}Role semantics:
-
system: Initial prompt with identity, tool descriptions, and instructions -
user: User messages or tool results (XML format for prompt-guided tools) -
assistant: LLM responses, may include text and/or tool calls -
tool: Tool execution results (native tool calling APIs only)
Sources: src/agent/loop_.rs:5, src/agent/loop_.rs:980, src/agent/loop_.rs:1069-1080
ZeroClaw uses a three-tier approach to prevent unbounded history growth:
| Constant | Value | Purpose |
|---|---|---|
DEFAULT_MAX_HISTORY_MESSAGES |
50 | Trigger for auto-compaction |
COMPACTION_KEEP_RECENT_MESSAGES |
20 | Messages retained after compaction |
COMPACTION_MAX_SOURCE_CHARS |
12,000 | Max characters sent to summarizer |
COMPACTION_MAX_SUMMARY_CHARS |
2,000 | Max characters in compaction summary |
The actual limit is configurable via config.agent.max_history_messages, falling back to DEFAULT_MAX_HISTORY_MESSAGES when unset.
Sources: src/agent/loop_.rs:79-91
flowchart TB
Start["Turn complete<br/>history.len()"]
CheckLimit{"non_system_count > max_history?"}
AutoCompact["auto_compact_history()"]
BuildTranscript["build_compaction_transcript()<br/>Extract USER/ASSISTANT turns"]
CallSummarizer["LLM.chat_with_system()<br/>Summarizer prompt<br/>temp=0.2"]
ApplySummary["apply_compaction_summary()<br/>Replace old messages with<br/>single assistant summary"]
HardTrim["trim_history()<br/>Safety cap: keep system +<br/>last N messages"]
End["Continue"]
Start --> CheckLimit
CheckLimit -->|No| HardTrim
CheckLimit -->|Yes| AutoCompact
AutoCompact --> BuildTranscript
BuildTranscript --> CallSummarizer
CallSummarizer --> ApplySummary
ApplySummary --> HardTrim
HardTrim --> End
When non_system_count > max_history_messages, the system triggers auto-compaction:
-
Calculate compaction range: Keep
COMPACTION_KEEP_RECENT_MESSAGES(20) most recent messages, compact everything older -
Build transcript: Extract messages in
USER: ...\nASSISTANT: ...format, truncate to 12K chars - Call LLM summarizer: Use system prompt requesting concise bullet points (max 12 points)
-
Replace with summary: Delete old messages, insert single assistant message with
[Compaction summary]\n<bullets>
Summarizer system prompt:
You are a conversation compaction engine. Summarize older chat history
into concise context for future turns. Preserve: user preferences,
commitments, decisions, unresolved tasks, key facts. Omit: filler,
repeated chit-chat, verbose tool logs. Output plain text bullet points only.
Fallback behavior: If summarization fails (network error, rate limit), the system falls back to deterministic local truncation of the transcript to 2K chars.
Sources: src/agent/loop_.rs:158-205, src/agent/loop_.rs:1548-1563
After compaction (or if compaction is skipped), trim_history() enforces a hard cap:
-
Preserve system message: Always keep
history[0]ifrole == "system" - Count non-system messages: Calculate how many messages to remove
-
Drain oldest: Remove oldest non-system messages to fit within
max_history
This ensures the agent never exceeds configured limits even if compaction is disabled or fails.
Sources: src/agent/loop_.rs:114-132, src/agent/loop_.rs:1562
Before sending user messages to the LLM, ZeroClaw enriches them with relevant context from two sources:
sequenceDiagram
participant User
participant Agent as "agent_turn()"
participant MemFunc as "build_context()"
participant Memory as "Memory::recall()"
participant LLM as "Provider::chat()"
User->>Agent: "user message"
Agent->>MemFunc: build_context(mem, message, min_score)
MemFunc->>Memory: recall(query=message, limit=5)
Memory-->>MemFunc: Vec<MemoryEntry> with scores
MemFunc->>MemFunc: Filter entries by<br/>min_relevance_score
MemFunc-->>Agent: "[Memory context]\n- key1: value1\n- key2: value2\n\n"
Agent->>Agent: Prepend context to user message
Agent->>LLM: chat(enriched_message, ...)
The build_context() function:
- Calls
mem.recall(user_msg, limit=5)to retrieve hybrid-search results - Filters entries where
score >= config.memory.min_relevance_score(default 0.3) - Formats as
[Memory context]\n- key: content\n...and prepends to user message
Sources: src/agent/loop_.rs:207-233, src/agent/loop_.rs:1369-1376, src/agent/loop_.rs:1498-1505
For peripherals-enabled configurations with datasheet_dir, build_hardware_context() injects datasheet documentation:
-
Pin alias lookup: If user message mentions "red led", inject
red_led: 13for matching boards -
RAG retrieval: Call
hardware_rag.retrieve(query, boards, limit)to get relevant chunks -
Format output:
[Hardware documentation]\n--- source (board) ---\ncontent\n...
The chunk limit is configurable: 5 for normal mode, 2 for compact_context mode.
Sources: src/agent/loop_.rs:236-273, src/agent/loop_.rs:1371-1376, src/agent/loop_.rs:1501-1505
sequenceDiagram
participant CLI as "CLI/Channel"
participant Loop as "run_tool_call_loop()"
participant Provider as "Provider::chat()"
participant History as "history: Vec<ChatMessage>"
CLI->>Loop: User message
Loop->>History: Prepend memory + HW context
Loop->>History: Push system + enriched user msg
loop Tool Call Iterations (max 10)
Loop->>Provider: chat(history, tools, model, temp)
Provider-->>Loop: response_text + tool_calls
alt Has Tool Calls
Loop->>Loop: Execute each tool
Loop->>History: Push assistant msg (with tool_calls)
Loop->>History: Push tool result msgs
else Text Only
Loop->>History: Push assistant msg (final response)
Loop-->>CLI: Return final text
end
end
CLI->>Loop: Next user message
Loop->>Loop: auto_compact_history() if needed
Loop->>Loop: trim_history() as safety cap
Key points:
-
History initialization: Start with
[system_prompt, enriched_user_message] - Tool call loop: Each iteration appends assistant response + tool results to history
-
Native vs. prompt-guided formats:
- Native: Store JSON with
tool_callsarray, followed byrole: toolmessages - Prompt-guided: Store XML
<tool_call>tags in assistant content, results in user message
- Native: Store JSON with
- Persistence: In interactive CLI mode, history persists across turns within the same session
- Compaction timing: Happens after turn completes but before next user message
Sources: src/agent/loop_.rs:851-1084, src/agent/loop_.rs:1383-1572
ZeroClaw adapts history format based on whether the provider supports native tool calling:
When provider.supports_native_tools() == true:
Assistant message with tool calls:
{
"content": "Let me check that for you.",
"tool_calls": [
{"id": "call_123", "name": "shell", "arguments": "{\"command\":\"ls\"}"}
]
}Tool result message:
{
"role": "tool",
"tool_call_id": "call_123",
"content": "file1.txt\nfile2.txt"
}The build_native_assistant_history() function serializes this format for history storage.
Sources: src/agent/loop_.rs:764-787, src/agent/loop_.rs:927-931, src/agent/loop_.rs:1069-1080
When native tools are unsupported or disabled:
Assistant message:
Let me check that for you.
<tool_call>
{"name": "shell", "arguments": {"command": "ls"}}
</tool_call>
User message with results:
[Tool results]
<tool_result name="shell">
file1.txt
file2.txt
</tool_result>
The build_assistant_history_with_tool_calls() function formats XML-based history for these providers.
Sources: src/agent/loop_.rs:789-808, src/agent/loop_.rs:1071
The scrub_credentials() function prevents accidental leakage of API keys, tokens, and passwords in tool outputs before they're added to history.
Uses RegexSet to match sensitive key names:
-
token,api_key,password,secret,user_key,bearer,credential
Extracts key-value pairs from common formats:
"api_key": "sk-abc123def456..." → "api_key": "sk-a*[REDACTED]"
password=mypass123 → password=mypa*[REDACTED]
token: "bearer_xyz..." → token: "bear*[REDACTED]"
Preserves first 4 characters for context, redacts the rest:
- 8+ character values → Show prefix, redact suffix
- Shorter values → Fully redacted
- Maintains original quote style (", ', or none)
Sources: src/agent/loop_.rs:25-77, src/agent/loop_.rs:1039
History behavior is controlled by config.agent settings:
| Setting | Type | Default | Description |
|---|---|---|---|
max_history_messages |
usize |
50 | Trigger for auto-compaction |
max_tool_iterations |
usize |
10 | Max tool call loop iterations |
compact_context |
bool |
false |
Enable context reduction (smaller RAG chunks, shorter bootstrap) |
Access in code via Config::load_or_init():
config.agent.max_history_messages // Compaction trigger
config.agent.max_tool_iterations // Loop safety limit
config.agent.compact_context // Context reduction flagSources: src/agent/loop_.rs:1399, src/agent/loop_.rs:1525, src/agent/loop_.rs:1552
When config.memory.auto_save == true, the agent automatically stores messages to long-term memory:
User messages:
let key = autosave_memory_key("user_msg"); // "user_msg_{uuid}"
mem.store(&key, &msg, MemoryCategory::Conversation, None).await;Assistant responses:
let summary = truncate_with_ellipsis(&response, 100);
let key = autosave_memory_key("assistant_resp"); // "assistant_resp_{uuid}"
mem.store(&key, &summary, MemoryCategory::Daily, None).await;This creates searchable memory entries without polluting the in-memory conversation history.
Sources: src/agent/loop_.rs:110-112, src/agent/loop_.rs:1361-1366, src/agent/loop_.rs:1407-1414, src/agent/loop_.rs:1490-1495, src/agent/loop_.rs:1564-1570
stateDiagram-v2
[*] --> Init: User starts CLI
Init --> Enriched: build_context() + build_hardware_context()
Enriched --> FirstTurn: history = [system, enriched_user]
FirstTurn --> ToolCalls: LLM returns tool_calls
ToolCalls --> ExecuteTools: Execute each tool
ExecuteTools --> AddResults: history.push(assistant + tool msgs)
AddResults --> NextIteration: Loop continues
NextIteration --> FinalResponse: LLM returns text only
FinalResponse --> SaveTurn: Auto-save to memory if enabled
SaveTurn --> WaitInput: Print response, wait
WaitInput --> CheckSize: User sends next message
CheckSize --> Compact: non_system_count > max_history
CheckSize --> Trim: non_system_count <= max_history
Compact --> Trim: Replace old msgs with summary
Trim --> Enriched: Hard cap applied, prepare next turn
Sources: src/agent/loop_.rs:1383-1572
In non-CLI contexts (gateway webhooks, channel messages), history is ephemeral per request:
Gateway /webhook endpoint:
// Each webhook request starts fresh
let history = vec![
ChatMessage::system(&system_prompt),
ChatMessage::user(message)
];
provider.simple_chat(message, model, temperature).awaitChannel message processing:
// Each channel message creates new history
let mut history = vec![
ChatMessage::system(&system_prompt),
ChatMessage::user(&enriched)
];
run_tool_call_loop(..., &mut history, ...).awaitNo compaction occurs for these single-turn interactions, but the same context enrichment and credential scrubbing apply.
Sources: src/gateway/mod.rs:735-738, src/agent/loop_.rs:1383-1386
Tool outputs (especially shell, file_read, http_request) may contain credentials. The scrubbing pass happens at src/agent/loop_.rs:1039 before appending results to history, preventing:
- API keys from leaking into subsequent LLM context
- Tokens appearing in memory auto-save entries
- Passwords being visible in observability logs
Without compaction, a long-running interactive session could:
- Exceed LLM context windows (128K+ tokens)
- Exhaust process memory (each message ~1KB+)
- Degrade performance (O(n) history scanning per turn)
The two-tier compaction strategy (auto-compact + hard trim) ensures bounded memory usage even for sessions with hundreds of turns.
Sources: src/agent/loop_.rs:42-77, src/agent/loop_.rs:114-132, src/agent/loop_.rs:158-205
- Memory System: Long-term storage for facts, preferences, and decisions (separate from conversation history)
- Agent Turn Cycle: Main loop that consumes and updates history during tool execution
- Tool Execution: How tool results are formatted and added to history
- System Prompt Construction: How the initial system message is built before history begins
Sources: src/agent/loop_.rs:1-1585