opencompress-docs/introduction.mdx at main · open-compress/opencompress-docs

title	Introduction
description	The compression layer between your app and any LLM. Save 30-40% on every API call.

What is OpenCompress?

OpenCompress is a drop-in middleware that sits between your application and any LLM provider. It compresses your prompts before they reach the model, reducing token usage by 30-40% while preserving output quality.

Get running in under 2 minutes. Change two lines of code. Understand the five-layer compression pipeline. Full OpenAI-compatible endpoint documentation. Pay-for-savings model. No savings = no charge.

Why OpenCompress?

Every LLM call you make contains token waste — filler words, redundant context, verbose formatting that models don't need. OpenCompress removes this waste before the request hits your provider.

Fully compatible with OpenAI's Chat Completions API. Works with any SDK. GPT-4o, Claude, Gemini, Llama, DeepSeek — we compress for all of them. We charge 20% of what we save you. If we don't save you money, you pay nothing extra. Change `base_url` and `api_key`. Everything else stays the same.

How much can you save?

Use Case	Typical Compression	Monthly Savings (at $10K spend)
RAG / retrieval-augmented generation	40-55% input reduction	$2,400 - $3,300
Agent tool calls	30-45% input reduction	$1,800 - $2,700
Chat with long context	35-50% input reduction	$2,100 - $3,000
Code generation	25-35% input reduction	$1,500 - $2,100

Savings vary by prompt structure. Prompts with more natural language and repeated patterns compress best. Try it in the [Playground](https://www.opencompress.ai/playground) with your actual prompts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is OpenCompress?

Why OpenCompress?

How much can you save?

FilesExpand file tree

introduction.mdx

Latest commit

History

introduction.mdx

File metadata and controls

What is OpenCompress?

Why OpenCompress?

How much can you save?