Glyphdown is a Claude Code plugin that lowers token cost across a session without changing what the agent can see. It compacts tool-result output, removes repeated content, and steers compaction toward a dense form — all lossless by meaning, fail-open (any error passes the original through untouched), and fast (a prebuilt native binary on the hot path, Python as the portable fallback).
It is dogfooded in production daily — the author runs it on their own Claude Code traffic — and only recently open-sourced. Lossless-by-meaning and fail-open are not aspirations; they are how it has had to behave to stay on every session.
It is free for noncommercial use (PolyForm Noncommercial 1.0.0); commercial use needs a paid license — see COMMERCIAL.md.
Your agent's tool output is mostly noise — ANSI codes, repeated reads, machine chatter, verbose compaction summaries. Glyphdown strips it losslessly, on-device, before it bills — and stacks on top of Anthropic's prompt cache.
| Measured | Reduction | On |
|---|---|---|
| Tool-heavy session corpus (52 real fixtures) | −31.7% | total tokens 85,405 → 58,347 (chars/4 proxy) |
Large Bash dumps |
−71.1% | the noisiest payloads |
| Instruction prose in the GLYPHDOWN-L1 dialect | −44.6% | every cached system-prompt call (Claude Opus tokens) |
| Network calls · API keys · data leaving your machine | 0 | 100% local, fail-open |
Char reduction is general; exact token % is model-specific — see General vs model-specific savings. Figures are measured, not asserted; the codec is fully open (glyphdown-core/) so the lossless behavior is verifiable, and the project does not publish numbers it has not measured.
Glyphdown hooks the request lifecycle at six points; every one fails open
({"continue": true} on any error — it can never block your input):
flowchart LR
U([Your prompt]) --> UPS[UserPromptSubmit<br/>mode detector]
UPS --> PRE[PreToolUse<br/>history dedup]
PRE --> TOOL[(Tool runs)]
TOOL --> POST[PostToolUse<br/>codec + session dedup]
POST --> Q{compaction?}
Q -- yes --> PC[PreCompact<br/>dense-form mandate]
Q -- no --> R([Reply — never compressed])
PC --> R
It does not fight Anthropic's cache — it works on a different token bucket. Native caching discounts what is already cached; Glyphdown shrinks the turn-to-turn traffic that changes every call and therefore never caches, plus the dense form of what does get cached. The two stack:
flowchart TB
subgraph B[What you are billed per call]
P[Stable prefix<br/>system prompt + tools]
T[Turn-to-turn traffic<br/>tool results, history, compaction]
end
P -->|Anthropic cache: up to −90%| C[cheap]
P -->|Glyphdown dialect: −44.6% of what caches| C
T -->|Glyphdown codec: −31.7% corpus| S[shrunk every call]
C --> L([Lower total])
S --> L
The hot path is a prebuilt native binary; Python is the portable fallback:
flowchart LR
H[PostToolUse hook] --> Qb{native binary<br/>available?}
Qb -- yes --> Rb[Rust codec ~5 ms]
Qb -- "no / GLYPHDOWN_RUST=0" --> Py[Python codec ~170 ms]
Rb --> O([compacted output<br/>identical, fail-open])
Py --> O
Glyphdown is named for its CoS — the symbolic notation it started as. GLYPHDOWN-L1
is a lossless prose↔dense transcoder: it rewrites verbose, repetitive
instruction-style prose (system prompts, CLAUDE.md, skill and agent files) into a
compact symbolic dialect the same model decodes natively, then expands it back
byte-for-byte.
expand(compress(x)) == x # byte-identical for dialect content;
# unrecognized text passes through untouched
Why it matters: the system prompt ships on every request, and dense instructions
cost far fewer tokens while the model reads them just as well — so this is the only
always-on, every-call saving. Measured −44.6% token reduction on dialect content
(opus-dialect-validate-2026-05-31). It is the language behind the PreCompact
dense-form mandate and the compress-config command — and it is where the whole
project began. Every other mechanism (tool-result codec, dedup, the state-aware
gate) stacks on top of it.
Model-specific dialects. Tokenizers differ per model, so the dialect is a data
file the binary loads at runtime (GLYPHDOWN_DIALECT) — tune or ship a dialect for
your model with no rebuild (lossless self-check on load). glyphdown-core dialect-export dumps the default to edit; glyphdown-core compress / expand run it
directly; compress-config applies it to your config files (dry-run + backup + lossless gate).
claude plugin marketplace add MikkoParkkola/glyphdown
claude plugin install glyphdown- Restart your session. Hooks fire automatically — nothing to configure.
- Use Claude normally for a few tool-heavy turns (reads, greps, bash).
- Check what it saved: run
glyphdown-stats— it reads the append-only audit log and prints per-tool savings (so the effect is measured, not asserted). - Tune aggressiveness with
glyphdown-set-levelif you want more or less. - Verify losslessness yourself:
echo "<dense prose>" | glyphdown-core compress | glyphdown-core expandround-trips byte-for-byte.
Nothing leaves your machine; if anything errors, the original output passes
through untouched. To pause, set GLYPHDOWN_DISABLE=1.
Token reduction is a crowded, genuinely diverse space. Glyphdown is a lossless, in-process, agent-tool-result compressor that stacks on native caching — it is not trying to be a learned compressor or a hosted proxy. The honest landscape (★ = GitHub stars, a maturity signal, not a quality verdict):
| Approach | Examples | Reversible? | Where it runs | Best at |
|---|---|---|---|---|
| Lossless in-process codec (this) | Glyphdown | yes (by meaning) | your machine, no hop | agent tool-results + dense compaction, stacked on cache |
| Lossless reversible pipeline | claw-compactor (2.2k★) | yes | varies | multi-stage general-text compression |
| Drop-in prompt compression | leanctx (300★+) | varies | library | production-app prompt text |
| Learned / lossy compression | LLMLingua / LLMLingua-2 (6k★) | no (drops tokens) | local model | aggressive prompt-text reduction where some loss is acceptable |
| Tool-call interceptor | Crucible | varies | in-process | nearest architecture to Glyphdown |
| Hosted edge proxy | Edgee, rtk-based |
partial | network hop | one layer across multiple agents / clients |
| Output brevity ("caveman") | caveman, eridani-speak | no | varies | shrinking the model's output register |
| Spend visibility (not compression) | claude-usage (1.7k★), ai-token-monitor | n/a | dashboard | seeing cost, not reducing it |
| Task offload | houtini-lm | n/a | local LLM | delegating bounded subtasks off the paid model |
Two honest notes:
- Glyphdown is small by star count next to LLMLingua or claw-compactor — but it
is dogfooded in production daily on real Claude Code traffic, not a research
demo. Recently open-sourced, long battle-tested. It competes on being **lossless
- in-process + zero-setup + cache-stacking**.
- These mostly compose. A lossless codec, a learned compressor, and a spend dashboard solve different parts of the bill — running more than one is normal, not redundant. Glyphdown deliberately occupies the lossless-in-process slot and leaves the others to their strengths.
Be precise about what is universal and what depends on the model:
- The character reduction is general. ANSI-strip, JSON-minify, dedup, and blank-collapse remove characters losslessly — that holds for any tokenizer, any model, any provider.
- The exact token percentage is model-specific. Tokens are not characters;
every model tokenizes differently. The headline figures were measured on
specific tokenizers: −44.6% dialect is in Claude Opus tokens
(
opus-dialect-validate-2026-05-31); −31.7% corpus uses a tokenizer-freechars/4proxy. Your model's real saving will differ — sometimes higher, sometimes lower. - Opus 4.7 / 4.8 ship their own tokenizers (as do GPT's o200k, Gemini, etc.).
The bundled
calibration/snapshot fits per-modeltokens-per-charso the keep-vs-compact decision matches the actual tokenizer rather than a fixed 4-char assumption — and it is refreshed as model tokenizers move (they can change silently on a model update).
Glyphdown is a token-cost-reduction system for LLM coding agents. The pieces have distinct roles:
- Glyphdown Plugin (this repo) — the free, client-side codec for Claude Code.
- Glyphdown Verify (
glyphdown-verify) — an MIT tool so savings can be verified independently, without trusting the provider. - A managed offering — spend visibility and prompt-cache protection for teams running Claude Code at scale — is available on inquiry (see COMMERCIAL.md).
The plugin is free and complete on its own; the rest is optional.
See Quickstart above — two commands, then restart
your session. glyphdown-stats shows savings; glyphdown-set-level tunes aggressiveness.
Glyphdown registers six hook points; every one fails open.
| Hook | What it does |
|---|---|
| PostToolUse — codec | Compacts each tool result: ANSI strip, JSON minify, blank-collapse, shape-aware compaction (JSON / YAML / TOML / code / filesystem path-lists), oversize truncation, schema-tag. Runs as the native binary, Python fallback. |
| PostToolUse — session dedup | A repeated Read/Grep/Glob/Monitor result is replaced with a short reference to its earlier occurrence in the session. |
| PreToolUse — history dedup | Collapses duplicate context already carried in earlier turns before a tool runs. |
| PreCompact — summary-form mandate | When Claude Code compacts, Glyphdown injects an instruction to summarize in a dense, structured form. |
| UserPromptSubmit — mode detector + stats | Detects the active aggressiveness level and serves the glyphdown-stats view. |
| SessionStart — skill loader | Loads the Glyphdown mode skill so the agent understands the dense conventions. |
Two guards make this safe:
- Break-even gating — a transform is applied only when it saves enough tokens to be worth its schema tag. Below that, the original passes through verbatim.
- Anchor-survival guard — a compaction that would drop the load-bearing
file:line, error code, identifier, that made the output useful is automatically reverted. Truncation is the only lossy step, and it is anchor-guarded.
The same binary speaks the Model Context Protocol over stdio, so any MCP client (Cursor, Cline, Zed, Continue, claude.ai connectors) can call the codec on demand — the codec is text→text and language-agnostic, so this, not a per-language SDK, is the portable reach surface.
The plugin registers it automatically (mcpServers in plugin.json). To wire it
into another client, point that client's MCP config at the launcher:
{
"mcpServers": {
"glyphdown": { "command": "sh", "args": ["<plugin>/bin/glyphdown-mcp.sh"] }
}
}Five tools, matching the CLI:
| Tool | Does |
|---|---|
glyphdown_compress |
prose → GLYPHDOWN-L1 dense (lossless; text not in the dialect passes through untouched) |
glyphdown_expand |
dense → prose (exact inverse) |
glyphdown_compress_config |
preview-compress a config/system-prompt file; returns compressed text + token savings + a lossless flag (read-only, never writes) |
glyphdown_extract |
shrink a large tool result — keep head + structural landmarks + load-bearing anchors, collapse the uniform middle into retrieve markers, stash the original (the producer in the extract→retrieve loop; targets the fat tail where a few huge results dominate volume) |
glyphdown_retrieve |
recover a rewind-stashed original (or a line range) by id — the consumer half of glyphdown_extract (resolves when co-located with the store that stashed it) |
Transport is newline-delimited JSON-RPC 2.0; stdout carries the protocol only,
every diagnostic goes to stderr, and the loop is fail-open (a malformed line
never kills the session). Run it directly with glyphdown-core mcp.
Glyphdown is Rust-first, Python-fallback:
- The hot-path codec ships as prebuilt native binaries under
bin/<triple>/(macOS and Linux, arm64 and x86_64). The PostToolUse hook runs the binary by default — roughly5 msper call versus~170 msto launch a Python interpreter, with identical output. - The Python codec is the portable fallback — used on an unsupported platform,
a missing binary, an
execdenied by policy, orGLYPHDOWN_RUST=0. Sohooks/PostToolUse/glyphdown_codec.pyand the modules it imports (cache, dedup, anchor-guard, tokenizer, paths) exist so the plugin still works where the binary cannot run. Every path is fail-open. - The lightweight glue hooks (skill loader, mode detector, stats handler, history dedup, PreCompact mandate) are Python because they are trivial and not on the per-tool-result hot path; a native port would buy nothing.
Binaries are reproducible from the in-repo source via bin/build.sh
and verified by bin/SHA256SUMS. The codec source is fully open —
glyphdown-core/ — read every line.
Anthropic's prompt cache is the biggest token-cost lever there is, and mutating an already-cached prefix forces a full re-fetch at creation price — the worst failure mode in this space. Glyphdown is cache-safe by construction, not by a flag:
- The codec acts only on fresh tool-result output (the appended tail), never on the system-prompt / tool-definition prefix that gets cached.
- Compaction is deterministic — the same payload always compacts to the same bytes, so a tool result that later becomes part of a cached prefix stays stable.
- Session-dedup back-references the repeat (
[seen earlier this session]) and never rewrites the earlier occurrence — the potentially-cached copy is untouched. - History-dedup only appends an advisory note; it does not rewrite prior turns.
- The user-visible reply is never compressed.
GLYPHDOWN_CACHE_AWARE (below) is optional defense-in-depth — a heuristic that backs
off on prefixes it infers are cache-hot. It is off by default because the
structural guarantees above already protect the cache, and the heuristic cannot
see Anthropic's real cache state (it awaits usage.cache_read_input_tokens). Turn it
on for belt-and-suspenders; you do not need it for correctness.
| Env var | Default | Effect |
|---|---|---|
GLYPHDOWN_RUST |
on | Set 0 to force the Python codec. |
GLYPHDOWN_ANCHOR_GUARD |
on | Set 0/off to disable the anchor-survival revert (not recommended). |
GLYPHDOWN_CACHE_AWARE |
off | Optional defense-in-depth: backs off compaction on inferred cache-hot prefixes. Off because the codec is already cache-safe by construction (see Cache safety); on adds per-call disk I/O. |
GLYPHDOWN_DATA_DIR |
~/.ultracos |
Where the audit log and state live. The default keeps the legacy ~/.ultracos name across the brand rename so existing audit/cache state is not orphaned. |
The codec's keep-vs-compact boundary uses a token estimate. Glyphdown ships a
calibration snapshot (calibration/): per-model
tokens-per-char values fitted from real, model-billed token counts, so the
estimate matches a model's actual tokenizer rather than a fixed assumption. The
fallback, when no snapshot value applies, is the classic 4-characters-per-token
estimate.
Public vs private. The codec source, the published snapshot (numbers, schema, version), this methodology, and the fallback are all here and inspectable. The data, the fitting method, and the pipeline that produce the snapshot are not — that is what makes a snapshot a result you can use but not regenerate. See METHODOLOGY.md.
It is a service. A model's tokenizer can change with a model update, with no changelog. The snapshot is therefore refreshed as model tokenizers change. A frozen copy keeps working under the license; a refreshed one tracks the change.
Every published value is fitted from measured counts. The project does not publish performance figures it has not measured.
Glyphdown writes an append-only audit row per compaction event (savings per tool,
shape, version) so its effect is measurable, not asserted. glyphdown-stats reads it.
expand(compress(x)) == x is a hard invariant in the shipped, open codec:
Dialect::is_lossless() in glyphdown-core/src/codec.rs
self-checks every dialect on load and falls back to the bundled default on any
collision, and the Rust unit tests assert the round-trip for all compiled-in dialect
pairs. A release gate runs that same proof before every publish, so a dialect change
can never silently break the round-trip. The codec source is fully open — verify it
yourself with glyphdown-core compress | glyphdown-core expand.
How do I reduce Claude Code token costs?
Install Glyphdown (claude plugin install glyphdown). It losslessly compacts
tool-result output, deduplicates repeated context, and compresses the system
prompt — stacking on top of Anthropic's prompt cache. No API key, no signup.
Is it lossless? Does it change what the agent sees?
The codec is lossless by meaning: expand(compress(x)) == x for dialect content,
and unrecognized text passes through untouched. Truncation is the only lossy step
and it is anchor-guarded (it reverts if it would drop a file:line or error
code). The user-visible reply is never compressed.
Does it work with models other than Claude? The character-level reductions (ANSI-strip, dedup, JSON-minify) are general and work with any model and provider. The exact token percentage is model-specific because each model tokenizes differently; a per-model calibration snapshot keeps the estimate honest. See General vs model-specific savings.
How much does it actually save?
−31.7% on a 52-fixture tool-heavy corpus, −71.1% on large Bash dumps, −44.6% on
instruction prose in the dialect (Claude Opus tokens). Your numbers depend on
your workload and model — run glyphdown-stats to measure your own.
How is it different from LLMLingua, claw-compactor, or a proxy like Edgee? LLMLingua is a learned, lossy prompt compressor (it drops tokens); Glyphdown is lossless and in-process. claw-compactor is a lossless multi-stage pipeline; Glyphdown targets agent tool-results + compaction specifically and stacks on the cache. Edgee is a hosted edge proxy (network hop); Glyphdown runs on your machine with no hop. They compose — see How it compares.
Does it send my data anywhere? No. 100% local, zero network calls, zero API keys. Tool output never leaves your machine. Every path is fail-open: on any error the original passes through.
Can I compress my CLAUDE.md / skills / agent files?
Yes — glyphdown-core compress-config <file> previews savings (dry-run by
default), and --apply writes them behind a lossless gate with an automatic
.glyphdown.bak backup. The system prompt ships on every request, so this is the
only always-on saving.
PolyForm Noncommercial License 1.0.0 — free for any noncommercial use. Commercial use requires a paid license: see COMMERCIAL.md or contact mikko.parkkola@iki.fi. Full text in LICENSE.