Skip to content

Token Optimization — Context Reduction and LLM-Assisted Compression#12259

Closed
JamesRobert20 wants to merge 47 commits intoRooCodeInc:mainfrom
Zoo-Code-Org:feat/token-optimization
Closed

Token Optimization — Context Reduction and LLM-Assisted Compression#12259
JamesRobert20 wants to merge 47 commits intoRooCodeInc:mainfrom
Zoo-Code-Org:feat/token-optimization

Conversation

@JamesRobert20
Copy link
Copy Markdown
Contributor

@JamesRobert20 JamesRobert20 commented May 3, 2026

What it does

Every time a tool like read_file or search_files runs, its full output goes into conversation history and gets re-sent to the primary model on every subsequent API call. A single read_file on a 1,000-line file can add 8,000–15,000 tokens that stay in context for the entire task.

This PR adds an invisible compression layer between tool execution and conversation history. Before a large tool result is stored, a cheap secondary model compresses it into a focused summary. The primary model sees less noise, the task runs cheaper, and the user sees nothing different in the UI.

All users benefit from env details diffing, old tool result truncation, parallel tool calls. LLM-assisted compression is subscription-only.

What changed

New files:

  • src/core/tools/ToolResultProcessor.tsshouldCompress() + compress() with configurable per-tool thresholds
  • src/core/tools/compressAndPush.ts — wrapper that replaces direct pushToolResult calls in tool handlers
  • src/core/tools/resolveCompressionHandler.ts — async subscription check via Zoo Code API (1hr cache per key), returns ZooGatewayApiHandler for subscribers or null for free users
  • src/core/tools/ToolResultProcessorConfig.ts — config interface + defaults
  • src/core/tools/CompletionPostProcessor.ts — optional reformatting of attempt_completion result text
  • src/api/providers/zoo-gateway.ts — routes compression calls to /api/proxy/internal/compress using the user's Zoo Code API key
  • src/core/context-management/compressToolResults.ts — truncates old tool results in long conversations
  • src/core/environment/environmentDiff.ts — only sends changed env detail sections on turns 2+

Modified files:

  • src/core/tools/{ReadFileTool,SearchFilesTool,ListFilesTool,CodebaseSearchTool,ExecuteCommandTool}.ts — use compressAndPushToolResult instead of raw pushToolResult
  • src/core/task/Task.ts — async handler init on construction, isSubscriber flag, toolResultProcessorSettings from global state, compressOldToolResults in main loop, env diff tracking
  • src/core/webview/webviewMessageHandler.ts — clears subscription cache when zooCodeApiKey changes
  • webview-ui/src/components/settings/SettingsView.tsxzooCodeApiKey input field
  • packages/types/src/global-settings.tszooCodeApiKey + zooCodeBaseUrl in schema, toolResultProcessorSettings
  • src/api/index.tstaskId optional, toolName added to ApiHandlerCreateMessageMetadata
  • src/package.jsonzooCodeApiKey/zooCodeBaseUrl VS Code settings contributions

Test coverage:

  • resolveCompressionHandler.spec.ts — 11 tests
  • ToolResultProcessor.spec.ts — 27 tests
  • compressAndPush.spec.ts — 6 tests
  • CompletionPostProcessor.spec.ts — 7 tests
  • compressToolResults.spec.ts — context compression tests
  • environmentDiff.spec.ts — env diff tests

How it works

User generates a Zoo Code API key at zoocode.dev/dashboard/api-tokens
  → pastes it in Settings → Zoo Code API Key
  → stored in VS Code SecretStorage

Task starts →
  resolveCompressionHandler(apiKey) called async
    → GET zoocode.dev/api/subscription/status (Bearer: apiKey)
    → returns { isSubscriber: true } for paid plans, false for free
  → ZooGatewayApiHandler instantiated for subscribers
  → ToolResultProcessor(handler) + isSubscriber: true set on config

Tool executes (e.g. read_file returns 10,000 chars) →
  compressAndPushToolResult("read_file", rawResult, context, task, pushToolResult)
    → shouldCompress() checks: enabled + isSubscriber + threshold exceeded
    → compress() calls ZooGatewayApiHandler.createMessage(systemPrompt, [rawResult], { toolName })
    → POST zoocode.dev/api/proxy/internal/compress (Bearer: apiKey)
      → returns { compressed: "focused 200-token summary" }
    → compressed result pushed to conversation history

Primary model receives focused summary instead of full 10,000-char blob

Compression triggers by tool

Tool Compresses when
read_file result > 1,500 chars
search_files / codebase_search > 20 non-empty lines
list_files > 100 paths
execute_command result > 1,500 chars

All thresholds are user-configurable via toolResultProcessorSettings in extension settings.

Optimizations that apply to everyone

  • Env details diffing — only changed sections sent on turns 2+, saves 150–600 tokens/turn
  • Old tool result truncationread_file/list_files results older than N turns replaced with [content omitted]
  • Parallel tool calls — already enabled via parallelToolCalls: true in metadata

Testing this manually

  1. Subscribe to Zoo Code Pro at zoocode.dev
  2. Go to Dashboard → API Tokens → generate a key (zoo_sk_...)
  3. Open VS Code → open the Zoo Code sidebar → click the settings icon → go to the About tab
  4. Under Zoo Code Subscription, paste the key into the API Key field
  5. Start a task, run read_file on a large file (> 1,500 chars)
  6. Verify the compressed summary — not the full raw output — appears in conversation history

Notes

  • zoo-gateway.ts has no unit test yet — toolName wiring via metadata is the one gap in test coverage
  • Evals A/B comparison (token cost with/without compression) is the next validation step before production rollout

Interactively review PR in Roo Code Cloud

edelauna and others added 30 commits April 23, 2026 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants