Caveman Mode

Caveman Mode is TitanX's token-saving layer for long conversations. On conversations with 50+ turns, it typically reduces token usage by 30–75%. Opt-in per team or per conversation.

The name is a joke about thinking "like a caveman" — using fewer words, grunting out only the essentials.

The problem it solves

Every turn in a conversation carries the full prior history to the LLM. After 100 turns, you're paying for the same 100-turn context every single turn — quadratic cost in conversation length.

Naive solutions (just dropping old turns) lose context you might still need. Caveman Mode is smarter: it compresses old turns into summaries while keeping recent turns verbatim.

How it works

Trigger

Caveman Mode activates when:

Total input token count exceeds a threshold (default 8000 tokens)
OR user manually invokes /compact

Compression

TitanX identifies "old" turns (everything except the last N, default N=5)
Sends them to the same model with a summarize prompt
Replaces the old turns in the conversation context with a single summary message

The original turns aren't deleted — they stay in SQLite for display. Only the in-context representation sent to the model changes.

Recent-turn fidelity

Recent turns (last N, typically 5) are always kept verbatim. The LLM has lossless access to what happened recently. Only far-back history is compressed.

Re-triggering

After compression, the conversation continues. When it grows back past the threshold (new turns accumulate), Caveman triggers again — compresses the now-old-turns-plus-previous-summary into a new consolidated summary. Repeat as needed.

When to enable

✅ Good fits

Long multi-step projects that span dozens of turns
Support / customer service bots with continuous conversations
Daily-standup type heartbeat agents that build up context
Research agents pulling lots of context from different sources

❌ Poor fits

Short, focused conversations (Caveman has overhead; <20 turns isn't worth it)
High-fidelity work where every detail matters (legal review, exact code refactoring on a specific line — compression can drop specifics)
Multi-step debugging where you're re-referencing turn-1 details on turn-50 (Caveman can drop those specifics; keep it off for these)

Configuration

Global toggle

Settings → System → Enable Caveman Mode — applies to all new conversations. Off by default.

Per-team override

Team Settings → Caveman Mode — override global for this team's agents.

Per-conversation

Conversation settings gear → Caveman Mode → on/off/inherit.

Threshold tuning

Settings → System → Caveman Threshold — token count at which compression triggers. Default 8000.

Lower (4000) → compresses more aggressively, more savings but more info loss
Higher (16000) → compresses less, less savings but higher fidelity

Sweet spot for most teams: 8000 (default). Enterprise teams with long support sessions often prefer 12000.

Keep-last-N tuning

Settings → System → Caveman Keep-Last-N Turns — how many recent turns stay verbatim. Default 5.

Lower (3) → aggressive compression; good for bot-style agents
Higher (10) → cautious; good for back-and-forth dialogue

What gets summarized

The summarization prompt asks the model for:

Key facts — names, numbers, IDs, decisions
State changes — "agent X completed task Y", "user clarified they want Z"
Open questions — unresolved points that might matter later
Pinned context — anything marked as pinned in the UI is always kept verbatim

Formatted output is a structured markdown block replacing the old turns. The LLM reads it as one message instead of N.

Example summary block:

[Compressed summary of turns 1-45]

Context:
- User is building a Python FastAPI service for a todo app
- Chose Postgres, Redis caching
- Agent wrote 3 endpoints (list/create/delete), user approved all
- Wrote unit tests for list endpoint

Open questions:
- User hasn't decided on auth strategy yet (mentioned JWT + OAuth options in turn 30)
- Deployment target (mentioned Fly.io vs. Railway in turn 42)

Last 5 turns: [shown verbatim below]

Manual invocation

/compact in any conversation → forces immediate Caveman pass regardless of threshold.

/restore → un-applies the last Caveman compression, restoring full history for the next turn. (Useful if compression dropped something you need.)

Cost of compression itself

Caveman's compression turn costs tokens too — it's one additional LLM call per trigger. Typical cost:

Compression call: ~2000-4000 tokens input, ~500 tokens output
Savings per subsequent turn: (full history - compressed summary) × remaining turns

Break-even is usually around turn 15-20 for an 8000-threshold config. Beyond that, net savings grow.

Compression caveats

Information loss is real. The summary is lossy by design. If you need perfect recall of turn-2's exact wording on turn-100, don't use Caveman.

Mitigations:

Pin critical messages (pinned content survives compression)
Use /compact manually at logical checkpoints (control when it triggers)
Keep a higher keep-last-N
Turn it off for high-fidelity work

Observability

Observability → Caveman Panel shows:

Total compressions performed
Tokens saved (estimated)
Cost saved (estimated)
Per-conversation compression history

If Caveman is off, the panel shows what you'd save if you enabled it (based on retroactive analysis of your conversation lengths).

Implementation detail: conversation-type support

Caveman currently works for these conversation types:

claude (API) ✅
gemini (API) ✅
acp with claude/gemini/codex backends ✅
openclaw-gateway ✅
farm — 🚧 v2.6 (farm conversations use the slave's caveman config, not master's)
nanobot — 🚧 not yet

If you flip Caveman on for an unsupported conversation type, the setting is stored but no-ops.

Related pages

Conversations and Chat UI — where /compact runs
Observability — Caveman panel details
Reasoning Bank — different memory mechanism; both can coexist

TitanX · Enterprise AI Agent Orchestration · Apache-2.0

Docs: Wiki · Technical docs · Releases · Security

Last updated for v2.5.1 — report doc issue · contribute to the wiki

TitanX Wiki

📖 Getting Started

🧩 Core Concepts

👤 End-User Guides

🌐 Fleet Mode

🌙 Dream Mode

🔒 Security

🛠 Developer

📘 Reference

❓ Help

🔗 Outside the wiki

v2.5.1 · 50+ pages · Contribute

Caveman Mode

Caveman Mode

The problem it solves

How it works

Trigger

Compression

Recent-turn fidelity

Re-triggering

When to enable

✅ Good fits

❌ Poor fits

Configuration

Global toggle

Per-team override

Per-conversation

Threshold tuning

Keep-last-N tuning

What gets summarized

Manual invocation

Cost of compression itself

Compression caveats

Observability

Implementation detail: conversation-type support

Related pages

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TitanX Wiki