Skip to content

congvmit/awesome-llm-token-reduction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Awesome LLM Token Reduction Awesome PRs Welcome License: CC0-1.0

A curated list of techniques, tools, and research for reducing LLM token usage — with a focus on AI coding assistants like Claude Code, OpenAI Codex, and GitHub Copilot.

Every prompt and response costs tokens, and coding agents burn through them fast: large files, tool output, logs, and long sessions all inflate the context window. This list collects the drop-in tools, libraries, data formats, and papers that cut tokens while keeping answers intact.

Contents

Surveys & Background

Start here for the lay of the land before picking a technique.

Coding-Assistant Token Savers

Drop-in proxies, plugins, hooks, and MCP servers that cut tokens for Claude Code, Codex, Copilot, Cursor, and Aider.

  • claude-rolling-context - Claude Code plugin that compresses old messages while keeping recent context verbatim. Stars
  • claude-shorthand - LLMLingua-2 prompt-compression hook for Claude Code. Stars
  • ClaudeShrink - Claude Code skill that shrinks large prompts and files with LLMLingua to save tokens. Stars
  • engram - Local-first context compression for AI coding tools, deduping redundant tokens across calls. Stars
  • entroly - Local proxy that compresses context for Claude Code, Codex, Cursor, and Aider. Stars
  • headroom - Compresses tool output, logs, files, and RAG chunks before they reach the LLM. Stars
  • llmtrim - Provider-agnostic Rust proxy that compresses input, output, and cache with no extra model calls. Stars
  • rtk - CLI proxy that cuts LLM token use 60-90% on common dev commands, single Rust binary. Stars
  • sigmap - Zero-dependency MCP server for AST-based code context reduction across 31 languages. Stars
  • token-optimizer-mcp - Claude Code MCP server reaching 95%+ token reduction through caching and optimization. Stars
  • token-reducer - Local-first Claude Code context compression using hybrid RAG and AST chunking. Stars
  • TokenTamer - Drop-in proxy that compresses bloated code context in real time to cut API costs. Stars
  • tokless - Unified CLI to install and update token-saving plugins for Claude Code, Codex, and OpenCode. Stars

Prompt Compression Libraries

General-purpose SDKs you call directly to compress prompts in any LLM app.

  • claw-compactor - 14-stage reversible, AST-aware pipeline for LLM token compression with zero inference cost. Stars
  • leanctx - Drop-in prompt-compression SDK for production LLM apps, built on LLMLingua-2. Stars
  • LLMLingua - Microsoft toolkit compressing prompts and KV-cache up to 20x with minimal quality loss. Stars
  • llmlingua-2-js - JavaScript/TypeScript implementation of LLMLingua-2 for browser and Node. Stars

Token-Efficient Data Formats

Compact, LLM-friendly encodings that pass the same data in fewer tokens than JSON.

  • TOON - Token-Oriented Object Notation, a lossless JSON encoding that cuts tokens ~30-60% for uniform data. Stars
  • Tooner - MCP proxy that converts JSON tool responses to TOON before they reach the model. Stars

Context & Memory Management

Persist and retrieve only what matters, so sessions stay short instead of replaying everything.

  • codex-agent-mem - Local-first MCP memory layer for Codex and Claude with compact, token-saving context packs. Stars
  • mnemosyne - Zero-dependency knowledge compression, ingestion, and hybrid retrieval engine. Stars
  • Zep - Context engineering platform that assembles relationship-aware context from a temporal knowledge graph. Stars

Output Compression

Reduce generation tokens — the part you pay the most for — without losing the answer.

  • caveman - Claude Code skill that rewrites output in terse "caveman speak" to cut ~65% of tokens. Stars
  • scrooge-mode - Output-compression skill for Claude Code and Codex measured on real session output tokens. Stars
  • squeez - Squeezes verbose LLM agent tool output down to only the relevant lines. Stars

Research & Methods

Foundational papers behind the tools above.

Contributing

Contributions are welcome! Please read the contribution guidelines first. In short: one entry per pull request, one entry per line, keep descriptions concise and present tense (ending with a period), verify the link resolves, and place the entry alphabetically within its section.


Star History

Star History Chart

About

A curated list of techniques, tools, and research for reducing LLM token usage. Optimize context for Claude Code, Copilot, Cursor, and Aider.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors