Skip to content

Latest commit

 

History

History
94 lines (64 loc) · 9.15 KB

File metadata and controls

94 lines (64 loc) · 9.15 KB

Awesome LLM Token Reduction Awesome PRs Welcome License: CC0-1.0

A curated list of techniques, tools, and research for reducing LLM token usage — with a focus on AI coding assistants like Claude Code, OpenAI Codex, and GitHub Copilot.

Every prompt and response costs tokens, and coding agents burn through them fast: large files, tool output, logs, and long sessions all inflate the context window. This list collects the drop-in tools, libraries, data formats, and papers that cut tokens while keeping answers intact.

Contents

Surveys & Background

Start here for the lay of the land before picking a technique.

Coding-Assistant Token Savers

Drop-in proxies, plugins, hooks, and MCP servers that cut tokens for Claude Code, Codex, Copilot, Cursor, and Aider.

  • claude-rolling-context - Claude Code plugin that compresses old messages while keeping recent context verbatim. Stars
  • claude-shorthand - LLMLingua-2 prompt-compression hook for Claude Code. Stars
  • ClaudeShrink - Claude Code skill that shrinks large prompts and files with LLMLingua to save tokens. Stars
  • engram - Local-first context compression for AI coding tools, deduping redundant tokens across calls. Stars
  • entroly - Local proxy that compresses context for Claude Code, Codex, Cursor, and Aider. Stars
  • headroom - Compresses tool output, logs, files, and RAG chunks before they reach the LLM. Stars
  • llmtrim - Provider-agnostic Rust proxy that compresses input, output, and cache with no extra model calls. Stars
  • rtk - CLI proxy that cuts LLM token use 60-90% on common dev commands, single Rust binary. Stars
  • sigmap - Zero-dependency MCP server for AST-based code context reduction across 31 languages. Stars
  • token-optimizer-mcp - Claude Code MCP server reaching 95%+ token reduction through caching and optimization. Stars
  • token-reducer - Local-first Claude Code context compression using hybrid RAG and AST chunking. Stars
  • TokenTamer - Drop-in proxy that compresses bloated code context in real time to cut API costs. Stars
  • tokless - Unified CLI to install and update token-saving plugins for Claude Code, Codex, and OpenCode. Stars

Prompt Compression Libraries

General-purpose SDKs you call directly to compress prompts in any LLM app.

  • claw-compactor - 14-stage reversible, AST-aware pipeline for LLM token compression with zero inference cost. Stars
  • leanctx - Drop-in prompt-compression SDK for production LLM apps, built on LLMLingua-2. Stars
  • LLMLingua - Microsoft toolkit compressing prompts and KV-cache up to 20x with minimal quality loss. Stars
  • llmlingua-2-js - JavaScript/TypeScript implementation of LLMLingua-2 for browser and Node. Stars

Token-Efficient Data Formats

Compact, LLM-friendly encodings that pass the same data in fewer tokens than JSON.

  • TOON - Token-Oriented Object Notation, a lossless JSON encoding that cuts tokens ~30-60% for uniform data. Stars
  • Tooner - MCP proxy that converts JSON tool responses to TOON before they reach the model. Stars

Context & Memory Management

Persist and retrieve only what matters, so sessions stay short instead of replaying everything.

  • codex-agent-mem - Local-first MCP memory layer for Codex and Claude with compact, token-saving context packs. Stars
  • mnemosyne - Zero-dependency knowledge compression, ingestion, and hybrid retrieval engine. Stars
  • Zep - Context engineering platform that assembles relationship-aware context from a temporal knowledge graph. Stars

Output Compression

Reduce generation tokens — the part you pay the most for — without losing the answer.

  • caveman - Claude Code skill that rewrites output in terse "caveman speak" to cut ~65% of tokens. Stars
  • scrooge-mode - Output-compression skill for Claude Code and Codex measured on real session output tokens. Stars
  • squeez - Squeezes verbose LLM agent tool output down to only the relevant lines. Stars

Research & Methods

Foundational papers behind the tools above.

Contributing

Contributions are welcome! Please read the contribution guidelines first. In short: one entry per pull request, one entry per line, keep descriptions concise and present tense (ending with a period), verify the link resolves, and place the entry alphabetically within its section.


Star History

Star History Chart