Skip to content

07.3 History Management

Nikolay Vyahhi edited this page Feb 19, 2026 · 3 revisions

History Management

Relevant source files

The following files were used as context for generating this wiki page:

Purpose: This document describes ZeroClaw's conversation history management system, which tracks message exchanges between users and the LLM, prevents unbounded memory growth through intelligent compaction, and enriches context with relevant memories. For information about long-term memory storage (recall/store/forget), see Memory System. For details about the agent turn loop that consumes history, see Agent Turn Cycle.


Overview

ZeroClaw maintains a per-session conversation history (Vec<ChatMessage>) that grows with each user message, assistant response, and tool execution. Without management, this history would eventually exceed model context limits and exhaust memory. The history management system provides:

  • Automatic compaction using LLM-based summarization when thresholds are exceeded
  • Hard trimming as a safety cap to prevent runaway growth
  • Context enrichment by injecting relevant memories and hardware documentation before LLM calls
  • Credential scrubbing to prevent accidental leakage of secrets in tool outputs
  • Format adaptation for both native tool calling APIs and XML-guided prompt-based tool use

Sources: src/agent/loop_.rs:1-1585


Data Structures

ChatMessage

The core unit of conversation history is ChatMessage from the providers module, which represents a single turn in the conversation:

ChatMessage {
    role: String,      // "system", "user", "assistant", "tool"
    content: String,   // Message text or JSON-encoded tool data
}

Role semantics:

  • system: Initial prompt with identity, tool descriptions, and instructions
  • user: User messages or tool results (XML format for prompt-guided tools)
  • assistant: LLM responses, may include text and/or tool calls
  • tool: Tool execution results (native tool calling APIs only)

Sources: src/agent/loop_.rs:5, src/agent/loop_.rs:980, src/agent/loop_.rs:1069-1080


History Size Limits

ZeroClaw uses a three-tier approach to prevent unbounded history growth:

Constant Value Purpose
DEFAULT_MAX_HISTORY_MESSAGES 50 Trigger for auto-compaction
COMPACTION_KEEP_RECENT_MESSAGES 20 Messages retained after compaction
COMPACTION_MAX_SOURCE_CHARS 12,000 Max characters sent to summarizer
COMPACTION_MAX_SUMMARY_CHARS 2,000 Max characters in compaction summary

The actual limit is configurable via config.agent.max_history_messages, falling back to DEFAULT_MAX_HISTORY_MESSAGES when unset.

Sources: src/agent/loop_.rs:79-91


History Compaction Strategy

flowchart TB
    Start["Turn complete<br/>history.len()"]
    CheckLimit{"non_system_count > max_history?"}
    AutoCompact["auto_compact_history()"]
    BuildTranscript["build_compaction_transcript()<br/>Extract USER/ASSISTANT turns"]
    CallSummarizer["LLM.chat_with_system()<br/>Summarizer prompt<br/>temp=0.2"]
    ApplySummary["apply_compaction_summary()<br/>Replace old messages with<br/>single assistant summary"]
    HardTrim["trim_history()<br/>Safety cap: keep system +<br/>last N messages"]
    End["Continue"]
    
    Start --> CheckLimit
    CheckLimit -->|No| HardTrim
    CheckLimit -->|Yes| AutoCompact
    AutoCompact --> BuildTranscript
    BuildTranscript --> CallSummarizer
    CallSummarizer --> ApplySummary
    ApplySummary --> HardTrim
    HardTrim --> End
Loading

Auto-Compaction Algorithm

When non_system_count > max_history_messages, the system triggers auto-compaction:

  1. Calculate compaction range: Keep COMPACTION_KEEP_RECENT_MESSAGES (20) most recent messages, compact everything older
  2. Build transcript: Extract messages in USER: ...\nASSISTANT: ... format, truncate to 12K chars
  3. Call LLM summarizer: Use system prompt requesting concise bullet points (max 12 points)
  4. Replace with summary: Delete old messages, insert single assistant message with [Compaction summary]\n<bullets>

Summarizer system prompt:

You are a conversation compaction engine. Summarize older chat history 
into concise context for future turns. Preserve: user preferences, 
commitments, decisions, unresolved tasks, key facts. Omit: filler, 
repeated chit-chat, verbose tool logs. Output plain text bullet points only.

Fallback behavior: If summarization fails (network error, rate limit), the system falls back to deterministic local truncation of the transcript to 2K chars.

Sources: src/agent/loop_.rs:158-205, src/agent/loop_.rs:1548-1563

Hard Trimming

After compaction (or if compaction is skipped), trim_history() enforces a hard cap:

  1. Preserve system message: Always keep history[0] if role == "system"
  2. Count non-system messages: Calculate how many messages to remove
  3. Drain oldest: Remove oldest non-system messages to fit within max_history

This ensures the agent never exceeds configured limits even if compaction is disabled or fails.

Sources: src/agent/loop_.rs:114-132, src/agent/loop_.rs:1562


Context Enrichment

Before sending user messages to the LLM, ZeroClaw enriches them with relevant context from two sources:

Memory Context

sequenceDiagram
    participant User
    participant Agent as "agent_turn()"
    participant MemFunc as "build_context()"
    participant Memory as "Memory::recall()"
    participant LLM as "Provider::chat()"
    
    User->>Agent: "user message"
    Agent->>MemFunc: build_context(mem, message, min_score)
    MemFunc->>Memory: recall(query=message, limit=5)
    Memory-->>MemFunc: Vec<MemoryEntry> with scores
    MemFunc->>MemFunc: Filter entries by<br/>min_relevance_score
    MemFunc-->>Agent: "[Memory context]\n- key1: value1\n- key2: value2\n\n"
    Agent->>Agent: Prepend context to user message
    Agent->>LLM: chat(enriched_message, ...)
Loading

The build_context() function:

  1. Calls mem.recall(user_msg, limit=5) to retrieve hybrid-search results
  2. Filters entries where score >= config.memory.min_relevance_score (default 0.3)
  3. Formats as [Memory context]\n- key: content\n... and prepends to user message

Sources: src/agent/loop_.rs:207-233, src/agent/loop_.rs:1369-1376, src/agent/loop_.rs:1498-1505

Hardware RAG Context

For peripherals-enabled configurations with datasheet_dir, build_hardware_context() injects datasheet documentation:

  1. Pin alias lookup: If user message mentions "red led", inject red_led: 13 for matching boards
  2. RAG retrieval: Call hardware_rag.retrieve(query, boards, limit) to get relevant chunks
  3. Format output: [Hardware documentation]\n--- source (board) ---\ncontent\n...

The chunk limit is configurable: 5 for normal mode, 2 for compact_context mode.

Sources: src/agent/loop_.rs:236-273, src/agent/loop_.rs:1371-1376, src/agent/loop_.rs:1501-1505


History in the Agent Loop

sequenceDiagram
    participant CLI as "CLI/Channel"
    participant Loop as "run_tool_call_loop()"
    participant Provider as "Provider::chat()"
    participant History as "history: Vec<ChatMessage>"
    
    CLI->>Loop: User message
    Loop->>History: Prepend memory + HW context
    Loop->>History: Push system + enriched user msg
    
    loop Tool Call Iterations (max 10)
        Loop->>Provider: chat(history, tools, model, temp)
        Provider-->>Loop: response_text + tool_calls
        
        alt Has Tool Calls
            Loop->>Loop: Execute each tool
            Loop->>History: Push assistant msg (with tool_calls)
            Loop->>History: Push tool result msgs
        else Text Only
            Loop->>History: Push assistant msg (final response)
            Loop-->>CLI: Return final text
        end
    end
    
    CLI->>Loop: Next user message
    Loop->>Loop: auto_compact_history() if needed
    Loop->>Loop: trim_history() as safety cap
Loading

Key points:

  1. History initialization: Start with [system_prompt, enriched_user_message]
  2. Tool call loop: Each iteration appends assistant response + tool results to history
  3. Native vs. prompt-guided formats:
    • Native: Store JSON with tool_calls array, followed by role: tool messages
    • Prompt-guided: Store XML <tool_call> tags in assistant content, results in user message
  4. Persistence: In interactive CLI mode, history persists across turns within the same session
  5. Compaction timing: Happens after turn completes but before next user message

Sources: src/agent/loop_.rs:851-1084, src/agent/loop_.rs:1383-1572


Native vs. Prompt-Guided History Formats

ZeroClaw adapts history format based on whether the provider supports native tool calling:

Native Tool Calling (OpenAI, Anthropic, etc.)

When provider.supports_native_tools() == true:

Assistant message with tool calls:

{
  "content": "Let me check that for you.",
  "tool_calls": [
    {"id": "call_123", "name": "shell", "arguments": "{\"command\":\"ls\"}"}
  ]
}

Tool result message:

{
  "role": "tool",
  "tool_call_id": "call_123",
  "content": "file1.txt\nfile2.txt"
}

The build_native_assistant_history() function serializes this format for history storage.

Sources: src/agent/loop_.rs:764-787, src/agent/loop_.rs:927-931, src/agent/loop_.rs:1069-1080

Prompt-Guided Tool Use (Ollama, custom models)

When native tools are unsupported or disabled:

Assistant message:

Let me check that for you.
<tool_call>
{"name": "shell", "arguments": {"command": "ls"}}
</tool_call>

User message with results:

[Tool results]
<tool_result name="shell">
file1.txt
file2.txt
</tool_result>

The build_assistant_history_with_tool_calls() function formats XML-based history for these providers.

Sources: src/agent/loop_.rs:789-808, src/agent/loop_.rs:1071


Credential Scrubbing

The scrub_credentials() function prevents accidental leakage of API keys, tokens, and passwords in tool outputs before they're added to history.

Detection Patterns

Uses RegexSet to match sensitive key names:

  • token, api_key, password, secret, user_key, bearer, credential

Extracts key-value pairs from common formats:

"api_key": "sk-abc123def456..."  → "api_key": "sk-a*[REDACTED]"
password=mypass123                → password=mypa*[REDACTED]
token: "bearer_xyz..."            → token: "bear*[REDACTED]"

Redaction Strategy

Preserves first 4 characters for context, redacts the rest:

  • 8+ character values → Show prefix, redact suffix
  • Shorter values → Fully redacted
  • Maintains original quote style (", ', or none)

Sources: src/agent/loop_.rs:25-77, src/agent/loop_.rs:1039


Configuration Options

History behavior is controlled by config.agent settings:

Setting Type Default Description
max_history_messages usize 50 Trigger for auto-compaction
max_tool_iterations usize 10 Max tool call loop iterations
compact_context bool false Enable context reduction (smaller RAG chunks, shorter bootstrap)

Access in code via Config::load_or_init():

config.agent.max_history_messages        // Compaction trigger
config.agent.max_tool_iterations         // Loop safety limit
config.agent.compact_context             // Context reduction flag

Sources: src/agent/loop_.rs:1399, src/agent/loop_.rs:1525, src/agent/loop_.rs:1552


Auto-Save Integration

When config.memory.auto_save == true, the agent automatically stores messages to long-term memory:

User messages:

let key = autosave_memory_key("user_msg");  // "user_msg_{uuid}"
mem.store(&key, &msg, MemoryCategory::Conversation, None).await;

Assistant responses:

let summary = truncate_with_ellipsis(&response, 100);
let key = autosave_memory_key("assistant_resp");  // "assistant_resp_{uuid}"
mem.store(&key, &summary, MemoryCategory::Daily, None).await;

This creates searchable memory entries without polluting the in-memory conversation history.

Sources: src/agent/loop_.rs:110-112, src/agent/loop_.rs:1361-1366, src/agent/loop_.rs:1407-1414, src/agent/loop_.rs:1490-1495, src/agent/loop_.rs:1564-1570


History Lifecycle Example

stateDiagram-v2
    [*] --> Init: User starts CLI
    Init --> Enriched: build_context() + build_hardware_context()
    Enriched --> FirstTurn: history = [system, enriched_user]
    FirstTurn --> ToolCalls: LLM returns tool_calls
    ToolCalls --> ExecuteTools: Execute each tool
    ExecuteTools --> AddResults: history.push(assistant + tool msgs)
    AddResults --> NextIteration: Loop continues
    NextIteration --> FinalResponse: LLM returns text only
    FinalResponse --> SaveTurn: Auto-save to memory if enabled
    SaveTurn --> WaitInput: Print response, wait
    WaitInput --> CheckSize: User sends next message
    CheckSize --> Compact: non_system_count > max_history
    CheckSize --> Trim: non_system_count <= max_history
    Compact --> Trim: Replace old msgs with summary
    Trim --> Enriched: Hard cap applied, prepare next turn
Loading

Sources: src/agent/loop_.rs:1383-1572


Usage in Gateway and Daemon

In non-CLI contexts (gateway webhooks, channel messages), history is ephemeral per request:

Gateway /webhook endpoint:

// Each webhook request starts fresh
let history = vec![
    ChatMessage::system(&system_prompt),
    ChatMessage::user(message)
];
provider.simple_chat(message, model, temperature).await

Channel message processing:

// Each channel message creates new history
let mut history = vec![
    ChatMessage::system(&system_prompt),
    ChatMessage::user(&enriched)
];
run_tool_call_loop(..., &mut history, ...).await

No compaction occurs for these single-turn interactions, but the same context enrichment and credential scrubbing apply.

Sources: src/gateway/mod.rs:735-738, src/agent/loop_.rs:1383-1386


Security Considerations

Credential Leakage Prevention

Tool outputs (especially shell, file_read, http_request) may contain credentials. The scrubbing pass happens at src/agent/loop_.rs:1039 before appending results to history, preventing:

  • API keys from leaking into subsequent LLM context
  • Tokens appearing in memory auto-save entries
  • Passwords being visible in observability logs

Memory Exhaustion

Without compaction, a long-running interactive session could:

  1. Exceed LLM context windows (128K+ tokens)
  2. Exhaust process memory (each message ~1KB+)
  3. Degrade performance (O(n) history scanning per turn)

The two-tier compaction strategy (auto-compact + hard trim) ensures bounded memory usage even for sessions with hundreds of turns.

Sources: src/agent/loop_.rs:42-77, src/agent/loop_.rs:114-132, src/agent/loop_.rs:158-205


Related Subsystems

  • Memory System: Long-term storage for facts, preferences, and decisions (separate from conversation history)
  • Agent Turn Cycle: Main loop that consumes and updates history during tool execution
  • Tool Execution: How tool results are formatted and added to history
  • System Prompt Construction: How the initial system message is built before history begins

Sources: src/agent/loop_.rs:1-1585


Clone this wiki locally