Headroom

Compress everything your AI agent reads. Same answers, fraction of the tokens.

Every tool call, DB query, file read, and RAG retrieval your agent makes is 70-95% boilerplate.
Headroom compresses it away before it hits the model.

Works with any agent — coding agents (Claude Code, Codex, Cursor, Aider), custom agents
(LangChain, LangGraph, Agno, Strands, OpenClaw), or your own Python and TypeScript code.

Where Headroom Fits

Your Agent / App
  (coding agents, customer support bots, RAG pipelines,
   data analysis agents, research agents, any LLM app)
      │
      │  tool calls, logs, DB reads, RAG results, file reads, API responses
      ▼
   Headroom  ← proxy, Python/TypeScript SDK, or framework integration
      │
      ▼
 LLM Provider  (OpenAI, Anthropic, Google, Bedrock, 100+ via LiteLLM)

Headroom sits between your application and the LLM provider. It intercepts requests, compresses the context, and forwards an optimized prompt. Use it as a transparent proxy (zero code changes), a Python function (compress()), or a framework integration (LangChain, LiteLLM, Agno).

What gets compressed

Headroom optimizes any data your agent injects into a prompt:

Tool outputs — shell commands, API calls, search results
Database queries — SQL results, key-value lookups
RAG retrievals — document chunks, embeddings results
File reads — code, logs, configs, CSVs
API responses — JSON, XML, HTML
Conversation history — long agent sessions with repetitive context

Quick Start

Python:

pip install "headroom-ai[all]"

TypeScript / Node.js:

npm install headroom-ai

Docker-native (no Python or Node on host):

curl -fsSL https://raw.githubusercontent.com/chopratejas/headroom/main/scripts/install.sh | bash

macOS uses Bash 4.3+, so run the installer with a newer Bash such as Homebrew's bash.

PowerShell:

irm https://raw.githubusercontent.com/chopratejas/headroom/main/scripts/install.ps1 | iex

Persistent local runtime (Python-native service/task flow):

headroom install apply --preset persistent-service --providers auto

Persistent local runtime (Docker-native wrapper / compose flow):

headroom install apply --preset persistent-docker

Any agent — one function

Python:

from headroom import compress

# Default (coding agents — protects user messages, compresses tool outputs)
result = compress(messages, model="claude-sonnet-4-5-20250929")
response = client.messages.create(model="claude-sonnet-4-5-20250929", messages=result.messages)
print(f"Saved {result.tokens_saved} tokens ({result.compression_ratio:.0%})")

# Document compression (financial, legal, clinical — compress everything, keep 50%)
result = compress(messages, model="claude-opus-4-20250514",
    compress_user_messages=True,   # Compress user messages too
    target_ratio=0.5,              # Keep 50% (preserves numbers/entities)
    protect_recent=0,              # Don't protect recent messages
)

TypeScript:

import { compress } from 'headroom-ai';

const result = await compress(messages, { model: 'gpt-4o' });
const response = await openai.chat.completions.create({ model: 'gpt-4o', messages: result.messages });
console.log(`Saved ${result.tokensSaved} tokens`);

Works with any LLM client — Anthropic, OpenAI, LiteLLM, Bedrock, Vercel AI SDK, or your own code. Full options via CompressConfig: compress_user_messages, target_ratio, protect_recent, protect_analysis_context.

Any agent — proxy (zero code changes)

headroom proxy --port 8787

# Run mode (default: token)
headroom proxy --mode token   # maximize compression
headroom proxy --mode cache   # preserve Anthropic/OpenAI prefix cache stability

# Point any LLM client at the proxy
ANTHROPIC_BASE_URL=http://localhost:8787 your-app
OPENAI_BASE_URL=http://localhost:8787/v1 your-app

Use token mode for short/medium sessions where raw compression savings matter most. Use cache mode for long-running chats where preserving prior-turn bytes improves provider cache reuse.

Works with any language, any tool, any framework. Proxy docs

Prefer Docker as the runtime provider? See Docker-native install. Want Headroom to stay up in the background? See Persistent installs.

Coding agents — one command

headroom wrap claude              # Starts proxy + launches Claude Code
headroom wrap copilot -- --model claude-sonnet-4-20250514
                                   # Starts proxy + launches GitHub Copilot CLI
headroom wrap codex               # Starts proxy + launches OpenAI Codex CLI
headroom wrap aider               # Starts proxy + launches Aider
headroom wrap cursor              # Starts proxy + prints Cursor config
headroom wrap openclaw            # Installs + configures OpenClaw plugin
headroom wrap claude --memory     # With persistent cross-agent memory
headroom wrap codex --memory      # Shares the same memory store
headroom wrap claude --code-graph # With code graph intelligence (codebase-memory-mcp)

Headroom starts a proxy, points your tool at it, and compresses everything automatically. Add --memory for persistent memory that's shared across agents. Add --code-graph for code intelligence via codebase-memory-mcp — indexes your codebase into a knowledge graph for call-chain traversal, impact analysis, and architectural queries. wrap copilot is part of the Python-native CLI; the Docker-native wrapper currently supports claude, codex, aider, cursor, and openclaw.

In Docker-native mode, Headroom still runs in Docker while wrapped tools run on the host. wrap claude, wrap codex, wrap aider, wrap cursor, and OpenClaw plugin setup (wrap openclaw / unwrap openclaw) are host-managed through the installed wrapper.

Multi-agent — SharedContext

from headroom import SharedContext

ctx = SharedContext()
ctx.put("research", big_agent_output)      # Agent A stores (compressed)
summary = ctx.get("research")               # Agent B reads (~80% smaller)
full = ctx.get("research", full=True)       # Agent B gets original if needed

Compress what moves between agents — any framework. SharedContext Guide

MCP Tools (Claude Code, Cursor)

headroom mcp install && claude

Gives your AI tool three MCP tools: headroom_compress, headroom_retrieve, headroom_stats. MCP Guide

Drop into your existing stack

Your setup	Add Headroom	One-liner
Any Python app	`compress()`	`result = compress(messages, model="gpt-4o")`
Any TypeScript app	`compress()`	`const result = await compress(messages, { model: 'gpt-4o' })`
Vercel AI SDK	Middleware	`wrapLanguageModel({ model, middleware: headroomMiddleware() })`
OpenAI Node SDK	Wrap client	`const client = withHeadroom(new OpenAI())`
Anthropic TS SDK	Wrap client	`const client = withHeadroom(new Anthropic())`
Multi-agent	SharedContext	`ctx = SharedContext(); ctx.put("key", data)`
LiteLLM	Callback	`litellm.callbacks = [HeadroomCallback()]`
Any Python proxy	ASGI Middleware	`app.add_middleware(CompressionMiddleware)`
Agno agents	Wrap model	`HeadroomAgnoModel(your_model)`
LangChain	Wrap model	`HeadroomChatModel(your_llm)`
OpenClaw	One-command wrap/unwrap	`headroom wrap openclaw` / `headroom unwrap openclaw`
Claude Code	Wrap	`headroom wrap claude`
GitHub Copilot CLI	Wrap	`headroom wrap copilot -- --model claude-sonnet-4-20250514`
Codex / Aider	Wrap	`headroom wrap codex` or `headroom wrap aider`
Always-on local proxy	Persistent install	`headroom install apply --preset persistent-service --providers auto`

Full Integration Guide | TypeScript SDK

Demo

Does It Actually Work?

100 production log entries. One critical error buried at position 67.

	Baseline	Headroom
Input tokens	10,144	1,260
Correct answers	4/4	4/4

Both responses: "payment-gateway, error PG-5523, fix: Increase max_connections to 500, 1,847 transactions affected."

87.6% fewer tokens. Same answer. Run it: python examples/needle_in_haystack_test.py

What Headroom kept

From 100 log entries, SmartCrusher kept 6: first 3 (boundary), the FATAL error at position 67 (anomaly detection), and last 2 (recency). The error was automatically preserved — not by keyword matching, but by statistical analysis of field variance.

Real Workloads

Scenario	Before	After	Savings
Code search (100 results)	17,765	1,408	92%
SRE incident debugging	65,694	5,118	92%
Codebase exploration	78,502	41,254	47%
GitHub issue triage	54,174	14,761	73%

Accuracy Benchmarks

Compression preserves accuracy — tested on real OSS benchmarks.

Standard Benchmarks — Baseline (direct to API) vs Headroom (through proxy):

Benchmark	Category	N	Baseline	Headroom	Delta
GSM8K	Math	100	0.870	0.870	0.000
TruthfulQA	Factual	100	0.530	0.560	+0.030

Compression Benchmarks — Accuracy after full compression stack:

Benchmark	Category	N	Accuracy	Compression	Method
SQuAD v2	QA	100	97%	19%	Before/After
BFCL	Tool/Function	100	97%	32%	LLM-as-Judge
Tool Outputs (built-in)	Agent	8	100%	20%	Before/After
CCR Needle Retention	Lossless	50	100%	77%	Exact Match

Run it yourself:

# Quick smoke test (8 cases, ~10s)
python -m headroom.evals quick -n 8 --provider openai --model gpt-4o-mini

# Full Tier 1 suite (~$3, ~15 min)
python -m headroom.evals suite --tier 1 -o eval_results/

# CI mode (exit 1 on regression)
python -m headroom.evals suite --tier 1 --ci

Full methodology: Benchmarks | Evals Framework

Key Capabilities

Lossless Compression

Headroom never throws data away. It compresses aggressively, stores the originals, and gives the LLM a tool to retrieve full details when needed. When it compresses 500 items to 20, it tells the model what was omitted ("87 passed, 2 failed, 1 error") so the model knows when to ask for more.

Smart Content Detection

Auto-detects what's in your context — JSON arrays, code, logs, plain text — and routes each to the best compressor. JSON goes to SmartCrusher, code goes through AST-aware compression (Python, JS, Go, Rust, Java, C++), text goes to Kompress (ModernBERT-based, with [ml] extra).

Cache Optimization

Stabilizes message prefixes so your provider's KV cache actually works. Claude offers a 90% read discount on cached prefixes — but almost no framework takes advantage of it. Headroom does.

Cross-Agent Memory

headroom wrap claude --memory    # Claude with persistent memory
headroom wrap codex --memory     # Codex shares the SAME memory store

Claude saves a fact, Codex reads it back. All agents sharing one proxy share one memory — project-scoped, user-isolated, with agent provenance tracking and automatic deduplication. No SDK changes needed. Memory docs

Failure Learning

headroom learn                        # Auto-detect agent (Claude, Codex, Gemini)
headroom learn --apply                # Write learnings to agent-native files
headroom learn --agent codex --all    # Analyze all Codex sessions

Plugin-based: reads conversation history from Claude Code, Codex, or Gemini CLI. Finds failure patterns, correlates with successes, writes corrections to CLAUDE.md / AGENTS.md / GEMINI.md. External plugins via entry points. Learn docs

Image Compression

40-90% token reduction via trained ML router. Automatically selects the right resize/quality tradeoff per image.

All features

Feature	What it does
Content Router	Auto-detects content type, routes to optimal compressor
SmartCrusher	Universal JSON compression — arrays of dicts, strings, numbers, mixed types, nested objects
CodeCompressor	AST-aware compression for Python, JS, Go, Rust, Java, C++
Kompress	ModernBERT token compression (replaces LLMLingua-2)
CCR	Reversible compression — LLM retrieves originals when needed
Compression Summaries	Tells the LLM what was omitted ("3 errors, 12 failures")
CacheAligner	Stabilizes prefixes for provider KV cache hits
IntelligentContext	Score-based context management with learned importance
Image Compression	40-90% token reduction via trained ML router
Memory	Cross-agent persistent memory — Claude saves, Codex reads it back. Agent provenance + auto-dedup
Compression Hooks	Customize compression with pre/post hooks
Read Lifecycle	Detects stale/superseded Read outputs, replaces with CCR markers
`headroom learn`	Plugin-based failure learning for Claude Code, Codex, Gemini CLI (extensible via entry points)
`headroom wrap`	One-command setup for Claude Code, GitHub Copilot CLI, Codex, Aider, Cursor
SharedContext	Compressed inter-agent context sharing for multi-agent workflows
MCP Tools	headroom_compress, headroom_retrieve, headroom_stats for Claude Code/Cursor

Headroom vs Alternatives

Context compression is a new space. Here's how the approaches differ:

	Approach	Scope	Deploy as	Framework integrations	Data stays local?	Reversible
Headroom	Multi-algorithm compression	All context (tool outputs, DB reads, RAG, files, logs, history)	Proxy, Python library, ASGI middleware, or callback	LangChain, LangGraph, Agno, Strands, LiteLLM, MCP	Yes (OSS)	Yes (CCR)
RTK	CLI command rewriter	Shell command outputs	CLI wrapper	None	Yes (OSS)	No
Compresr	Cloud compression API	Text sent to their API	API call	None	No	No
Token Company	Cloud compression API	Text sent to their API	API call	None	No	No

Use it however you want. Headroom works as a standalone proxy (headroom proxy), a one-function Python library (compress()), ASGI middleware, or a LiteLLM callback. Already using LiteLLM, LangChain, or Agno? Drop Headroom in without replacing anything.

Headroom + RTK work well together. RTK rewrites CLI commands (git show → git show --short), Headroom compresses everything else (JSON arrays, code, logs, RAG results, conversation history). Use both.

Headroom vs cloud APIs. Compresr and Token Company are hosted services — you send your context to their servers, they compress and return it. Headroom runs locally. Your data never leaves your machine. You also get lossless compression (CCR): the LLM can retrieve the full original when it needs more detail.

How It Works Inside

  Your prompt
      │
      ▼
  1. CacheAligner            Stabilize prefix for KV cache
      │
      ▼
  2. ContentRouter           Route each content type:
      │                         → SmartCrusher    (JSON)
      │                         → CodeCompressor  (code)
      │                         → Kompress        (text, with [ml])
      ▼
  3. IntelligentContext      Score-based token fitting
      │
      ▼
  LLM Provider

  Needs full details? LLM calls headroom_retrieve.
  Originals are in the Compressed Store — nothing is thrown away.

Overhead: 15-200ms compression latency (net positive for Sonnet/Opus). Full data: Latency Benchmarks

Integrations

Integration	Status	Docs
`headroom wrap claude/copilot/codex/aider/cursor`	Stable	Proxy Docs
`compress()` — one function	Stable	Integration Guide
`SharedContext` — multi-agent	Stable	SharedContext Guide
LiteLLM callback	Stable	Integration Guide
ASGI middleware	Stable	Integration Guide
Proxy server	Stable	Proxy Docs
Agno	Stable	Agno Guide
MCP (Claude Code, Cursor, etc.)	Stable	MCP Guide
Strands	Stable	Strands Guide
LangChain	Stable	LangChain Guide
OpenClaw	Stable	OpenClaw plugin

OpenClaw Plugin

The @headroom-ai/openclaw plugin integrates Headroom as a ContextEngine for OpenClaw. It compresses tool outputs, code, logs, and structured data inline — 70-90% token savings with zero LLM calls. The plugin can connect to a local or remote Headroom proxy and will auto-start one locally if needed.

Install

pip install "headroom-ai[proxy]"
openclaw plugins install --dangerously-force-unsafe-install headroom-ai/openclaw

Why --dangerously-force-unsafe-install? The plugin auto-starts headroom proxy as a subprocess when no running proxy is detected. OpenClaw blocks process-launching plugins by default, so this flag is required to permit that behavior.

Once installed, assign Headroom as the context engine in your OpenClaw config:

{
  "plugins": {
    "entries": { "headroom": { "enabled": true } },
    "slots": { "contextEngine": "headroom" }
  }
}

The plugin auto-detects and auto-starts the proxy — no manual proxy management needed. See the plugin README for full configuration options, local development setup, and launcher details.

Cloud Providers

headroom proxy --backend bedrock --region us-east-1     # AWS Bedrock
headroom proxy --backend vertex_ai --region us-central1 # Google Vertex
headroom proxy --backend azure                          # Azure OpenAI
headroom proxy --backend openrouter                     # OpenRouter (400+ models)

Installation

pip install headroom-ai                # Core library
pip install "headroom-ai[all]"         # Everything including evals (recommended)
pip install "headroom-ai[proxy]"       # Proxy server + MCP tools
pip install "headroom-ai[mcp]"         # MCP tools only (no proxy)
pip install "headroom-ai[ml]"          # ML compression (Kompress, requires torch)
pip install "headroom-ai[agno]"        # Agno integration
pip install "headroom-ai[langchain]"   # LangChain (experimental)
pip install "headroom-ai[evals]"       # Evaluation framework only

Container images (GHCR tags)

supported platforms: linux/amd64, linux/arm64
tags :code - image with Code-Aware Compression (AST-based) i.e. pip install "headroom-ai[proxy,code]"
tags :slim - image with distorless base

Tag		Extras	Docker Bake target
`<version>`	`ghcr.io/chopratejas/headroom:<version>`	`proxy`	`runtime`
`latest`	`ghcr.io/chopratejas/headroom:latest`	`proxy`	`runtime`
`nonroot`	`ghcr.io/chopratejas/headroom:nonroot`	`proxy`	`runtime-nonroot`
`code`	`ghcr.io/chopratejas/headroom:code`	`proxy,code`	`runtime-code`
`code-nonroot`	`ghcr.io/chopratejas/headroom:code-nonroot`	`proxy,code`	`runtime-code-nonroot`
`slim`	`ghcr.io/chopratejas/headroom:slim`	`proxy`	`runtime-slim`
`slim-nonroot`	`ghcr.io/chopratejas/headroom:slim-nonroot`	`proxy`	`runtime-slim-nonroot`
`code-slim`	`ghcr.io/chopratejas/headroom:code-slim`	`proxy,code`	`runtime-code-slim`
`code-slim-nonroot`	`ghcr.io/chopratejas/headroom:code-slim-nonroot`	`proxy,code`	`runtime-code-slim-nonroot`

Docker Bake

# List all available build targets
docker buildx bake --list targets

# Build default image locally (proxy + nonroot)
docker buildx bake runtime-default

# Build one variant and load to local Docker image store
docker buildx bake runtime-code-slim-nonroot \
  --set runtime-code-slim-nonroot.platform=linux/amd64 \
  --set runtime-code-slim-nonroot.tags=headroom:local \
  --load

Python 3.10+

Documentation


Integration Guide	LiteLLM, ASGI, compress(), proxy
Proxy Docs	Proxy server configuration
Architecture	How the pipeline works
CCR Guide	Reversible compression
Benchmarks	Accuracy validation
Latency Benchmarks	Compression overhead & cost-benefit analysis
Limitations	When compression helps, when it doesn't
Evals Framework	Prove compression preserves accuracy
Memory	Cross-agent persistent memory with provenance + dedup
Agno	Agno agent framework
MCP	Context engineering toolkit (compress, retrieve, stats)
SharedContext	Compressed inter-agent context sharing
Learn	Plugin-based failure learning (Claude, Codex, Gemini, extensible)
CLI Reference	Complete command surface, help output, and Docker parity matrix
Docker-Native Install	Host wrapper install, compose support, and Docker runtime behavior
Persistent Installs	Service/task/docker deployment models and provider scopes
Configuration	All options

Community

Questions, feedback, or just want to follow along? Join us on Discord

Contributing

git clone https://github.com/chopratejas/headroom.git && cd headroom
pip install -e ".[dev]" && pytest

Prefer a containerized setup? Open the repo in .devcontainer/devcontainer.json for the default Python/uv workflow, or .devcontainer/memory-stack/devcontainer.json when you need local Qdrant + Neo4j services and the locked memory-stack extra for the qdrant-neo4j memory backend. Inside that container, use qdrant:6333 and neo4j://neo4j:7687 instead of localhost.

License

Apache License 2.0 — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 582 Commits
.devcontainer		.devcontainer
.github		.github
benchmarks		benchmarks
docker		docker
docs		docs
e2e		e2e
examples		examples
headroom		headroom
plugins/openclaw		plugins/openclaw
scripts		scripts
sdk/typescript		sdk/typescript
sql		sql
tests		tests
wiki		wiki
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Headroom-2.gif		Headroom-2.gif
HeadroomDemo-Fast.gif		HeadroomDemo-Fast.gif
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
SECURITY.md		SECURITY.md
docker-bake.hcl		docker-bake.hcl
docker-compose.yml		docker-compose.yml
headroom_learn.gif		headroom_learn.gif
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Headroom

Where Headroom Fits

What gets compressed

Quick Start

Any agent — one function

Any agent — proxy (zero code changes)

Coding agents — one command

Multi-agent — SharedContext

MCP Tools (Claude Code, Cursor)

Drop into your existing stack

Demo

Does It Actually Work?

Real Workloads

Accuracy Benchmarks

Key Capabilities

Lossless Compression

Smart Content Detection

Cache Optimization

Cross-Agent Memory

Failure Learning

Image Compression

Headroom vs Alternatives

How It Works Inside

Integrations

OpenClaw Plugin

Install

Cloud Providers

Installation

Container images (GHCR tags)

Docker Bake

Documentation

Community

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 5

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages