| title | Introduction |
|---|---|
| description | The compression layer between your app and any LLM. Save 30-40% on every API call. |
OpenCompress is a drop-in middleware that sits between your application and any LLM provider. It compresses your prompts before they reach the model, reducing token usage by 30-40% while preserving output quality.
Get running in under 2 minutes. Change two lines of code. Understand the five-layer compression pipeline. Full OpenAI-compatible endpoint documentation. Pay-for-savings model. No savings = no charge.Every LLM call you make contains token waste — filler words, redundant context, verbose formatting that models don't need. OpenCompress removes this waste before the request hits your provider.
Fully compatible with OpenAI's Chat Completions API. Works with any SDK. GPT-4o, Claude, Gemini, Llama, DeepSeek — we compress for all of them. We charge 20% of what we save you. If we don't save you money, you pay nothing extra. Change `base_url` and `api_key`. Everything else stays the same.| Use Case | Typical Compression | Monthly Savings (at $10K spend) |
|---|---|---|
| RAG / retrieval-augmented generation | 40-55% input reduction | $2,400 - $3,300 |
| Agent tool calls | 30-45% input reduction | $1,800 - $2,700 |
| Chat with long context | 35-50% input reduction | $2,100 - $3,000 |
| Code generation | 25-35% input reduction | $1,500 - $2,100 |