07.3 History Management

History Management

Relevant source files

The following files were used as context for generating this wiki page:

Purpose: This document describes ZeroClaw's conversation history management system, which tracks message exchanges between users and the LLM, prevents unbounded memory growth through intelligent compaction, and enriches context with relevant memories. For information about long-term memory storage (recall/store/forget), see Memory System. For details about the agent turn loop that consumes history, see Agent Turn Cycle.

Overview

ZeroClaw maintains a per-session conversation history (Vec<ChatMessage>) that grows with each user message, assistant response, and tool execution. Without management, this history would eventually exceed model context limits and exhaust memory. The history management system provides:

Automatic compaction using LLM-based summarization when thresholds are exceeded
Hard trimming as a safety cap to prevent runaway growth
Context enrichment by injecting relevant memories and hardware documentation before LLM calls
Credential scrubbing to prevent accidental leakage of secrets in tool outputs
Format adaptation for both native tool calling APIs and XML-guided prompt-based tool use

Sources: src/agent/loop_.rs:1-1585

Data Structures

ChatMessage

The core unit of conversation history is ChatMessage from the providers module, which represents a single turn in the conversation:

ChatMessage {
    role: String,      // "system", "user", "assistant", "tool"
    content: String,   // Message text or JSON-encoded tool data
}

Role semantics:

system: Initial prompt with identity, tool descriptions, and instructions
user: User messages or tool results (XML format for prompt-guided tools)
assistant: LLM responses, may include text and/or tool calls
tool: Tool execution results (native tool calling APIs only)

Sources: src/agent/loop_.rs:5, src/agent/loop_.rs:980, src/agent/loop_.rs:1069-1080

History Size Limits

ZeroClaw uses a three-tier approach to prevent unbounded history growth:

Constant	Value	Purpose
`DEFAULT_MAX_HISTORY_MESSAGES`	50	Trigger for auto-compaction
`COMPACTION_KEEP_RECENT_MESSAGES`	20	Messages retained after compaction
`COMPACTION_MAX_SOURCE_CHARS`	12,000	Max characters sent to summarizer
`COMPACTION_MAX_SUMMARY_CHARS`	2,000	Max characters in compaction summary

The actual limit is configurable via config.agent.max_history_messages, falling back to DEFAULT_MAX_HISTORY_MESSAGES when unset.

Sources: src/agent/loop_.rs:79-91

History Compaction Strategy

flowchart TB
    Start["Turn complete<br/>history.len()"]
    CheckLimit{"non_system_count > max_history?"}
    AutoCompact["auto_compact_history()"]
    BuildTranscript["build_compaction_transcript()<br/>Extract USER/ASSISTANT turns"]
    CallSummarizer["LLM.chat_with_system()<br/>Summarizer prompt<br/>temp=0.2"]
    ApplySummary["apply_compaction_summary()<br/>Replace old messages with<br/>single assistant summary"]
    HardTrim["trim_history()<br/>Safety cap: keep system +<br/>last N messages"]
    End["Continue"]
    
    Start --> CheckLimit
    CheckLimit -->|No| HardTrim
    CheckLimit -->|Yes| AutoCompact
    AutoCompact --> BuildTranscript
    BuildTranscript --> CallSummarizer
    CallSummarizer --> ApplySummary
    ApplySummary --> HardTrim
    HardTrim --> End

Auto-Compaction Algorithm

When non_system_count > max_history_messages, the system triggers auto-compaction:

Calculate compaction range: Keep COMPACTION_KEEP_RECENT_MESSAGES (20) most recent messages, compact everything older
Build transcript: Extract messages in USER: ...\nASSISTANT: ... format, truncate to 12K chars
Call LLM summarizer: Use system prompt requesting concise bullet points (max 12 points)
Replace with summary: Delete old messages, insert single assistant message with [Compaction summary]\n<bullets>

Summarizer system prompt:

You are a conversation compaction engine. Summarize older chat history 
into concise context for future turns. Preserve: user preferences, 
commitments, decisions, unresolved tasks, key facts. Omit: filler, 
repeated chit-chat, verbose tool logs. Output plain text bullet points only.

Fallback behavior: If summarization fails (network error, rate limit), the system falls back to deterministic local truncation of the transcript to 2K chars.

Sources: src/agent/loop_.rs:158-205, src/agent/loop_.rs:1548-1563

Hard Trimming

After compaction (or if compaction is skipped), trim_history() enforces a hard cap:

Preserve system message: Always keep history[0] if role == "system"
Count non-system messages: Calculate how many messages to remove
Drain oldest: Remove oldest non-system messages to fit within max_history

This ensures the agent never exceeds configured limits even if compaction is disabled or fails.

Sources: src/agent/loop_.rs:114-132, src/agent/loop_.rs:1562

Context Enrichment

Before sending user messages to the LLM, ZeroClaw enriches them with relevant context from two sources:

Memory Context

sequenceDiagram
    participant User
    participant Agent as "agent_turn()"
    participant MemFunc as "build_context()"
    participant Memory as "Memory::recall()"
    participant LLM as "Provider::chat()"
    
    User->>Agent: "user message"
    Agent->>MemFunc: build_context(mem, message, min_score)
    MemFunc->>Memory: recall(query=message, limit=5)
    Memory-->>MemFunc: Vec<MemoryEntry> with scores
    MemFunc->>MemFunc: Filter entries by<br/>min_relevance_score
    MemFunc-->>Agent: "[Memory context]\n- key1: value1\n- key2: value2\n\n"
    Agent->>Agent: Prepend context to user message
    Agent->>LLM: chat(enriched_message, ...)

The build_context() function:

Calls mem.recall(user_msg, limit=5) to retrieve hybrid-search results
Filters entries where score >= config.memory.min_relevance_score (default 0.3)
Formats as [Memory context]\n- key: content\n... and prepends to user message

Sources: src/agent/loop_.rs:207-233, src/agent/loop_.rs:1369-1376, src/agent/loop_.rs:1498-1505

Hardware RAG Context

For peripherals-enabled configurations with datasheet_dir, build_hardware_context() injects datasheet documentation:

Pin alias lookup: If user message mentions "red led", inject red_led: 13 for matching boards
RAG retrieval: Call hardware_rag.retrieve(query, boards, limit) to get relevant chunks
Format output: [Hardware documentation]\n--- source (board) ---\ncontent\n...

The chunk limit is configurable: 5 for normal mode, 2 for compact_context mode.

Sources: src/agent/loop_.rs:236-273, src/agent/loop_.rs:1371-1376, src/agent/loop_.rs:1501-1505

History in the Agent Loop

sequenceDiagram
    participant CLI as "CLI/Channel"
    participant Loop as "run_tool_call_loop()"
    participant Provider as "Provider::chat()"
    participant History as "history: Vec<ChatMessage>"
    
    CLI->>Loop: User message
    Loop->>History: Prepend memory + HW context
    Loop->>History: Push system + enriched user msg
    
    loop Tool Call Iterations (max 10)
        Loop->>Provider: chat(history, tools, model, temp)
        Provider-->>Loop: response_text + tool_calls
        
        alt Has Tool Calls
            Loop->>Loop: Execute each tool
            Loop->>History: Push assistant msg (with tool_calls)
            Loop->>History: Push tool result msgs
        else Text Only
            Loop->>History: Push assistant msg (final response)
            Loop-->>CLI: Return final text
        end
    end
    
    CLI->>Loop: Next user message
    Loop->>Loop: auto_compact_history() if needed
    Loop->>Loop: trim_history() as safety cap

Key points:

History initialization: Start with [system_prompt, enriched_user_message]
Tool call loop: Each iteration appends assistant response + tool results to history
Native vs. prompt-guided formats:
- Native: Store JSON with tool_calls array, followed by role: tool messages
- Prompt-guided: Store XML <tool_call> tags in assistant content, results in user message
Persistence: In interactive CLI mode, history persists across turns within the same session
Compaction timing: Happens after turn completes but before next user message

Sources: src/agent/loop_.rs:851-1084, src/agent/loop_.rs:1383-1572

Native vs. Prompt-Guided History Formats

ZeroClaw adapts history format based on whether the provider supports native tool calling:

Native Tool Calling (OpenAI, Anthropic, etc.)

When provider.supports_native_tools() == true:

Assistant message with tool calls:

{
  "content": "Let me check that for you.",
  "tool_calls": [
    {"id": "call_123", "name": "shell", "arguments": "{\"command\":\"ls\"}"}
  ]
}

Tool result message:

{
  "role": "tool",
  "tool_call_id": "call_123",
  "content": "file1.txt\nfile2.txt"
}

The build_native_assistant_history() function serializes this format for history storage.

Sources: src/agent/loop_.rs:764-787, src/agent/loop_.rs:927-931, src/agent/loop_.rs:1069-1080

Prompt-Guided Tool Use (Ollama, custom models)

When native tools are unsupported or disabled:

Assistant message:

Let me check that for you.
<tool_call>
{"name": "shell", "arguments": {"command": "ls"}}
</tool_call>

User message with results:

[Tool results]
<tool_result name="shell">
file1.txt
file2.txt
</tool_result>

The build_assistant_history_with_tool_calls() function formats XML-based history for these providers.

Sources: src/agent/loop_.rs:789-808, src/agent/loop_.rs:1071

Credential Scrubbing

The scrub_credentials() function prevents accidental leakage of API keys, tokens, and passwords in tool outputs before they're added to history.

Detection Patterns

Uses RegexSet to match sensitive key names:

token, api_key, password, secret, user_key, bearer, credential

Extracts key-value pairs from common formats:

"api_key": "sk-abc123def456..."  → "api_key": "sk-a*[REDACTED]"
password=mypass123                → password=mypa*[REDACTED]
token: "bearer_xyz..."            → token: "bear*[REDACTED]"

Redaction Strategy

Preserves first 4 characters for context, redacts the rest:

8+ character values → Show prefix, redact suffix
Shorter values → Fully redacted
Maintains original quote style (", ', or none)

Sources: src/agent/loop_.rs:25-77, src/agent/loop_.rs:1039

Configuration Options

History behavior is controlled by config.agent settings:

Setting	Type	Default	Description
`max_history_messages`	`usize`	50	Trigger for auto-compaction
`max_tool_iterations`	`usize`	10	Max tool call loop iterations
`compact_context`	`bool`	`false`	Enable context reduction (smaller RAG chunks, shorter bootstrap)

Access in code via Config::load_or_init():

config.agent.max_history_messages        // Compaction trigger
config.agent.max_tool_iterations         // Loop safety limit
config.agent.compact_context             // Context reduction flag

Sources: src/agent/loop_.rs:1399, src/agent/loop_.rs:1525, src/agent/loop_.rs:1552

Auto-Save Integration

When config.memory.auto_save == true, the agent automatically stores messages to long-term memory:

User messages:

let key = autosave_memory_key("user_msg");  // "user_msg_{uuid}"
mem.store(&key, &msg, MemoryCategory::Conversation, None).await;

Assistant responses:

let summary = truncate_with_ellipsis(&response, 100);
let key = autosave_memory_key("assistant_resp");  // "assistant_resp_{uuid}"
mem.store(&key, &summary, MemoryCategory::Daily, None).await;

This creates searchable memory entries without polluting the in-memory conversation history.

Sources: src/agent/loop_.rs:110-112, src/agent/loop_.rs:1361-1366, src/agent/loop_.rs:1407-1414, src/agent/loop_.rs:1490-1495, src/agent/loop_.rs:1564-1570

History Lifecycle Example

stateDiagram-v2
    [*] --> Init: User starts CLI
    Init --> Enriched: build_context() + build_hardware_context()
    Enriched --> FirstTurn: history = [system, enriched_user]
    FirstTurn --> ToolCalls: LLM returns tool_calls
    ToolCalls --> ExecuteTools: Execute each tool
    ExecuteTools --> AddResults: history.push(assistant + tool msgs)
    AddResults --> NextIteration: Loop continues
    NextIteration --> FinalResponse: LLM returns text only
    FinalResponse --> SaveTurn: Auto-save to memory if enabled
    SaveTurn --> WaitInput: Print response, wait
    WaitInput --> CheckSize: User sends next message
    CheckSize --> Compact: non_system_count > max_history
    CheckSize --> Trim: non_system_count <= max_history
    Compact --> Trim: Replace old msgs with summary
    Trim --> Enriched: Hard cap applied, prepare next turn

Sources: src/agent/loop_.rs:1383-1572

Usage in Gateway and Daemon

In non-CLI contexts (gateway webhooks, channel messages), history is ephemeral per request:

Gateway /webhook endpoint:

// Each webhook request starts fresh
let history = vec![
    ChatMessage::system(&system_prompt),
    ChatMessage::user(message)
];
provider.simple_chat(message, model, temperature).await

Channel message processing:

// Each channel message creates new history
let mut history = vec![
    ChatMessage::system(&system_prompt),
    ChatMessage::user(&enriched)
];
run_tool_call_loop(..., &mut history, ...).await

No compaction occurs for these single-turn interactions, but the same context enrichment and credential scrubbing apply.

Sources: src/gateway/mod.rs:735-738, src/agent/loop_.rs:1383-1386

Security Considerations

Credential Leakage Prevention

Tool outputs (especially shell, file_read, http_request) may contain credentials. The scrubbing pass happens at src/agent/loop_.rs:1039 before appending results to history, preventing:

API keys from leaking into subsequent LLM context
Tokens appearing in memory auto-save entries
Passwords being visible in observability logs

Memory Exhaustion

Without compaction, a long-running interactive session could:

Exceed LLM context windows (128K+ tokens)
Exhaust process memory (each message ~1KB+)
Degrade performance (O(n) history scanning per turn)

The two-tier compaction strategy (auto-compact + hard trim) ensures bounded memory usage even for sessions with hundreds of turns.

Sources: src/agent/loop_.rs:42-77, src/agent/loop_.rs:114-132, src/agent/loop_.rs:158-205

Related Subsystems

Memory System: Long-term storage for facts, preferences, and decisions (separate from conversation history)
Agent Turn Cycle: Main loop that consumes and updates history during tool execution
Tool Execution: How tool results are formatted and added to history
System Prompt Construction: How the initial system message is built before history begins

Sources: src/agent/loop_.rs:1-1585

Home

07.3 History Management

History Management

Overview

Data Structures

ChatMessage

History Size Limits

History Compaction Strategy

Auto-Compaction Algorithm

Hard Trimming

Context Enrichment

Memory Context

Hardware RAG Context

History in the Agent Loop

Native vs. Prompt-Guided History Formats

Native Tool Calling (OpenAI, Anthropic, etc.)

Prompt-Guided Tool Use (Ollama, custom models)

Credential Scrubbing

Detection Patterns

Redaction Strategy

Configuration Options

Auto-Save Integration

History Lifecycle Example

Usage in Gateway and Daemon

Security Considerations

Credential Leakage Prevention

Memory Exhaustion

Related Subsystems

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!