A Claude Code skill that shrinks massive prompts and files using LLMLingua to save tokens.
This skill enables Claude Code to handle massive files (logs, documentation, long traces) by compressing them using LLMLingua. It uses the gpt2 model locally on your CPU to reduce text size while maintaining semantic integrity, significantly saving context window tokens.
ClaudeShrink uses a dynamic adaptive sizing algorithm to decide how heavily to compress inputs. By default, it targets a ~70% reduction but respects hard floors and ceilings.
| Input Type | Text Size | Original Tokens | Compressed Tokens | Reduction | Notes |
|---|---|---|---|---|---|
| Small Prompts | < 2,000 chars |
~500 |
~500 |
0% | Meets 512-token safety floor; stays uncompressed |
| Large Texts | ~15,000 chars |
~3,750 |
~1,125 |
~70% | Targets 30% of original tokens |
| Massive Files | > 50,000 chars |
> 12,500 |
4,096 |
> 70-90% | Hits the 4096-token hard safety cap |
Run this one-liner in your terminal:
curl -fsSL https://raw.githubusercontent.com/g-akshay/ClaudeShrink/main/install.sh | bashThis will:
- Clone the repo into
~/.claude/skills/ClaudeShrink - Create an isolated Python venv (no system pollution)
- Install
llmlingua,torch,transformers, andaccelerate
Requirements: Python 3.9+ and
gitmust be on your PATH.
- macOS: install Python via
brew install pythonor python.org- Linux (Ubuntu/Debian):
sudo apt install python3 git—python3-venvis auto-handled by the installerNote: The
gpt2model (~500 MB) is downloaded on first use, not at install time.
To update to the latest version, simply run the exact same command again. It is designed to be safe, additive, and will pull the latest code without destroying your environment:
curl -fsSL https://raw.githubusercontent.com/g-akshay/ClaudeShrink/main/install.sh | bashBecause ClaudeShrink uses GPT-2 instead of massive local LLMs, it is incredibly lightweight and runs smoothly in the background while you code.
| Component | Requirement | Explanation |
|---|---|---|
| Memory (RAM) | 2GB+ (4GB recommended) | The local GPT-2 model is tiny. Standard developer machines will not notice it running. |
| Storage (Disk) | ~4GB | torch and dependencies take ~3.5GB; the model weights are ~500MB. |
| Compute | Basic CPU | No GPU required. Runs lightning-fast on Intel, AMD, and Apple Silicon CPUs. |
| Software | Python 3.9+, Git | Standard environment prerequisites. |
Claude will automatically use this skill when it detects a request to process a large file. You can also explicitly trigger it by asking:
"Use the ClaudeShrink skill to read ./very_large_log.log and summarize the errors."
Defaults in scripts/compressor.py:
| Setting | Default | Notes |
|---|---|---|
| Model | gpt2 |
Lightweight, CPU-friendly |
| Device | cpu |
Change to cuda if you have a GPU |
| Target Tokens | Auto (30% of input, 512–4096) | Adaptive — no manual tuning needed |
Edit ~/.claude/skills/ClaudeShrink/scripts/compressor.py to override.
As Agentic AI and long context windows become the norm, token consumption has skyrocketed. Sending massive trace logs or full code repositories to LLMs on every query often leads to unseen budget explosions.
Recently, companies like Uber have made headlines for burning through their entire annual AI budgets in mere months. Giving engineers tools to push massive payloads to AI is powerful, but doing so without a compression layer is financially dangerous.
ClaudeShrink prevents this by:
- Stripping redundant or low-information tokens exactly where humans easily miss them.
- Handling the compression entirely on your local CPU for free before the payload touches the paid cloud API.
- Giving AI assistants a standardized, enforced cap on how many tokens they consume per request.
| Platform | Supported | Notes |
|---|---|---|
| macOS | ✅ | Intel + Apple Silicon |
| Linux | ✅ | Ubuntu, Debian (auto-installs python3-venv), Fedora, Arch |
| Windows | Via WSL (Ubuntu recommended) only |