Skip to content

Latest commit

 

History

History
55 lines (46 loc) · 2.21 KB

File metadata and controls

55 lines (46 loc) · 2.21 KB
title Introduction
description The compression layer between your app and any LLM. Save 30-40% on every API call.

What is OpenCompress?

OpenCompress is a drop-in middleware that sits between your application and any LLM provider. It compresses your prompts before they reach the model, reducing token usage by 30-40% while preserving output quality.

Get running in under 2 minutes. Change two lines of code. Understand the five-layer compression pipeline. Full OpenAI-compatible endpoint documentation. Pay-for-savings model. No savings = no charge.

Why OpenCompress?

Every LLM call you make contains token waste — filler words, redundant context, verbose formatting that models don't need. OpenCompress removes this waste before the request hits your provider.

Fully compatible with OpenAI's Chat Completions API. Works with any SDK. GPT-4o, Claude, Gemini, Llama, DeepSeek — we compress for all of them. We charge 20% of what we save you. If we don't save you money, you pay nothing extra. Change `base_url` and `api_key`. Everything else stays the same.

How much can you save?

Use Case Typical Compression Monthly Savings (at $10K spend)
RAG / retrieval-augmented generation 40-55% input reduction $2,400 - $3,300
Agent tool calls 30-45% input reduction $1,800 - $2,700
Chat with long context 35-50% input reduction $2,100 - $3,000
Code generation 25-35% input reduction $1,500 - $2,100
Savings vary by prompt structure. Prompts with more natural language and repeated patterns compress best. Try it in the [Playground](https://www.opencompress.ai/playground) with your actual prompts.