-
Notifications
You must be signed in to change notification settings - Fork 5
Caveman Mode
Caveman Mode is TitanX's token-saving layer for long conversations. On conversations with 50+ turns, it typically reduces token usage by 30–75%. Opt-in per team or per conversation.
The name is a joke about thinking "like a caveman" — using fewer words, grunting out only the essentials.
Every turn in a conversation carries the full prior history to the LLM. After 100 turns, you're paying for the same 100-turn context every single turn — quadratic cost in conversation length.
Naive solutions (just dropping old turns) lose context you might still need. Caveman Mode is smarter: it compresses old turns into summaries while keeping recent turns verbatim.
Caveman Mode activates when:
- Total input token count exceeds a threshold (default 8000 tokens)
- OR user manually invokes
/compact
- TitanX identifies "old" turns (everything except the last N, default N=5)
- Sends them to the same model with a summarize prompt
- Replaces the old turns in the conversation context with a single summary message
The original turns aren't deleted — they stay in SQLite for display. Only the in-context representation sent to the model changes.
Recent turns (last N, typically 5) are always kept verbatim. The LLM has lossless access to what happened recently. Only far-back history is compressed.
After compression, the conversation continues. When it grows back past the threshold (new turns accumulate), Caveman triggers again — compresses the now-old-turns-plus-previous-summary into a new consolidated summary. Repeat as needed.
- Long multi-step projects that span dozens of turns
- Support / customer service bots with continuous conversations
- Daily-standup type heartbeat agents that build up context
- Research agents pulling lots of context from different sources
- Short, focused conversations (Caveman has overhead; <20 turns isn't worth it)
- High-fidelity work where every detail matters (legal review, exact code refactoring on a specific line — compression can drop specifics)
- Multi-step debugging where you're re-referencing turn-1 details on turn-50 (Caveman can drop those specifics; keep it off for these)
Settings → System → Enable Caveman Mode — applies to all new conversations. Off by default.
Team Settings → Caveman Mode — override global for this team's agents.
Conversation settings gear → Caveman Mode → on/off/inherit.
Settings → System → Caveman Threshold — token count at which compression triggers. Default 8000.
- Lower (4000) → compresses more aggressively, more savings but more info loss
- Higher (16000) → compresses less, less savings but higher fidelity
Sweet spot for most teams: 8000 (default). Enterprise teams with long support sessions often prefer 12000.
Settings → System → Caveman Keep-Last-N Turns — how many recent turns stay verbatim. Default 5.
- Lower (3) → aggressive compression; good for bot-style agents
- Higher (10) → cautious; good for back-and-forth dialogue
The summarization prompt asks the model for:
- Key facts — names, numbers, IDs, decisions
- State changes — "agent X completed task Y", "user clarified they want Z"
- Open questions — unresolved points that might matter later
- Pinned context — anything marked as pinned in the UI is always kept verbatim
Formatted output is a structured markdown block replacing the old turns. The LLM reads it as one message instead of N.
Example summary block:
[Compressed summary of turns 1-45]
Context:
- User is building a Python FastAPI service for a todo app
- Chose Postgres, Redis caching
- Agent wrote 3 endpoints (list/create/delete), user approved all
- Wrote unit tests for list endpoint
Open questions:
- User hasn't decided on auth strategy yet (mentioned JWT + OAuth options in turn 30)
- Deployment target (mentioned Fly.io vs. Railway in turn 42)
Last 5 turns: [shown verbatim below]
/compact in any conversation → forces immediate Caveman pass regardless of threshold.
/restore → un-applies the last Caveman compression, restoring full history for the next turn. (Useful if compression dropped something you need.)
Caveman's compression turn costs tokens too — it's one additional LLM call per trigger. Typical cost:
- Compression call: ~2000-4000 tokens input, ~500 tokens output
- Savings per subsequent turn: (full history - compressed summary) × remaining turns
Break-even is usually around turn 15-20 for an 8000-threshold config. Beyond that, net savings grow.
Information loss is real. The summary is lossy by design. If you need perfect recall of turn-2's exact wording on turn-100, don't use Caveman.
Mitigations:
- Pin critical messages (pinned content survives compression)
- Use
/compactmanually at logical checkpoints (control when it triggers) - Keep a higher
keep-last-N - Turn it off for high-fidelity work
Observability → Caveman Panel shows:
- Total compressions performed
- Tokens saved (estimated)
- Cost saved (estimated)
- Per-conversation compression history
If Caveman is off, the panel shows what you'd save if you enabled it (based on retroactive analysis of your conversation lengths).
Caveman currently works for these conversation types:
-
claude(API) ✅ -
gemini(API) ✅ -
acpwith claude/gemini/codex backends ✅ -
openclaw-gateway✅ -
farm— 🚧 v2.6 (farm conversations use the slave's caveman config, not master's) -
nanobot— 🚧 not yet
If you flip Caveman on for an unsupported conversation type, the setting is stored but no-ops.
-
Conversations and Chat UI — where
/compactruns - Observability — Caveman panel details
- Reasoning Bank — different memory mechanism; both can coexist
TitanX · Enterprise AI Agent Orchestration · Apache-2.0
Docs: Wiki · Technical docs · Releases · Security
Last updated for v2.5.1 — report doc issue · contribute to the wiki
📖 Getting Started
🧩 Core Concepts
- Architecture Overview
- Agents and Teams
- Agent Gallery and Templates
- ACP Runtimes
- MCP Servers
- Workspaces
- Reasoning Bank
👤 End-User Guides
- Hiring Agents from the Gallery
- The Sprint Board
- Conversations and Chat UI
- Using Custom Assistants
- Skills Hub
- Cron and Scheduled Tasks
- Observability
- Caveman Mode
🌐 Fleet Mode
- Fleet Mode Overview
- Master Setup Guide
- Slave Enrollment
- Agent Farm Setup
- Publishing Agent Templates
- Command Center
- Device Forensics and Revocation
🌙 Dream Mode
- Dream Mode Overview
- Enabling Dream Mode
- Dream Pass Internals
- Consolidated Learnings Dashboard
- Privacy and Redaction
🔒 Security
- Security Model
- IAM Policies
- Audit Logging
- Device Identity and Signing
- Secrets Management
- Compliance and Data Residency
🛠 Developer
- Development Setup
- Project Structure
- Code Conventions
- Testing
- Adding an ACP Runtime
- Adding an MCP Server
- Pull Request Workflow
📘 Reference
- Configuration Keys
- Environment Variables
- IPC Channels
- Database Schema
- Fleet Command Types
- Telemetry Shape
- CLI and Keyboard Shortcuts
❓ Help
🔗 Outside the wiki
v2.5.1 · 50+ pages · Contribute