Skip to content

ankit481/tokensieve

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tokensieve

Every token counts.

Cloud CLIs were built for humans to skim. Your AI agent has to read every character.


aws ec2 describe-instances returns 14,766 tokens. Your agent needed 8,594. You paid for 14,766.

Nulls. Empty arrays. Base64 blobs. Epoch timestamps. Duplicate IDs repeated across every object. None of it matters to your agent. All of it costs tokens.

tokensieve strips it out before your agent reads it.

[TokenSieve] Original: 14766 tok | Compressed: 8594 tok | Saved: 6172 (41.8%)

No changes to your agent. No config files. Five commands to install.


Results

17 real AWS API calls. No cherry-picking.

Command Original Compressed Saved
eks describe-cluster 1,785 tok 599 tok 66.4%
ec2 describe-security-groups (5) 3,108 tok 1,410 tok 54.6%
ec2 describe-subnets (8) 2,639 tok 1,265 tok 52.1%
ec2 describe-vpcs 2,714 tok 1,375 tok 49.3%
ec2 describe-instances (6) 14,766 tok 8,594 tok 41.8%
logs describe-log-groups (10) 1,053 tok 738 tok 29.9%
lambda list-functions (5) 2,125 tok 1,592 tok 25.1%

40,483 tokens in → 21,487 out → 46.9% savings

The EKS number (66%) is mostly one thing. Every EKS cluster response embeds a PEM certificate as a JSON string. ~800 tokens of base64. tokensieve detects it by content — no field-name hints, no per-tool config — and replaces it with <base64 1476 chars>. Four tokens.

Full per-stage breakdowns: docs/stress-tests.md


Install

Requires Rust (stable).

git clone https://github.com/YOUR_USERNAME/tokensieve
cd tokensieve
cargo build --release

Register the tools you want intercepted:

mkdir -p ~/.tokensieve/bin

ln -sf $(pwd)/target/release/tokensieve ~/.tokensieve/bin/aws
ln -sf $(pwd)/target/release/tokensieve ~/.tokensieve/bin/kubectl
ln -sf $(pwd)/target/release/tokensieve ~/.tokensieve/bin/databricks

export PATH="$HOME/.tokensieve/bin:$PATH"   # add to ~/.zshrc or ~/.bashrc

Verify:

which aws               # → ~/.tokensieve/bin/aws
aws ec2 describe-vpcs   # compressed output + receipt on stderr

Usage

Proxy mode (default)

Your agent calls aws .... The symlink intercepts it. tokensieve finds the real binary further down $PATH, runs it, compresses the output, returns it.

agent → aws (symlink) → tokensieve → real aws → compressed output → agent

Non-JSON output passes through untouched. Exit codes are preserved. The agent cannot tell it's there.

Fetch mode

Run multiple commands concurrently, compress the merged result once:

printf "databricks grants get catalog prod\ndatabricks grants get catalog staging\n" \
  | tokensieve fetch

Manual pipeline

cat response.json | tokensieve

How it works

Six stages on every response:

Stage What it does
Scrub Strip ANSI escape codes
Gate Non-JSON passes through at zero cost
Sieve Remove nulls, empty values, base64 blobs
Dedupe Drop epoch timestamps; first-seen-wins scalar deduplication
Route Schema-YAML for dense arrays; PVFN for everything else
Emit Compressed payload to stdout, receipt to stderr

Schema-YAML emits keys once, values as compact rows. No pipes, no separator lines.

PVFN (Path-Value Flattened Notation) flattens nested JSON to dot-notation paths, abbreviates long repeated key names, and inlines Schema-YAML blocks for dense sub-arrays.

The router picks based on fill ratio. No configuration.

Full design doc: docs/ARCHITECTURE.md


Contributing

Issues, PRs, and compression reports welcome.

Where the headroom is:

  • New CLI coverage — Tested against GCP, Azure, Terraform, gh, docker? Open an issue with a sanitized sample and measured savings.
  • Compression improvements — Two changes would push EC2 from ~40% to ~65%+: auto-unwrapping nested wrappers (Reservations → Instances), and recursive compression of embedded JSON strings. Details in docs/stress-tests.md.
  • Bug reports — Output garbled, or savings negative on a real payload? File an issue with a sanitized sample.
git clone https://github.com/YOUR_USERNAME/tokensieve
cd tokensieve
cargo test

MIT

About

CLI output compression for AI agents. Every token counts.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages