Library vs Spec Implementation: 38% Token Discrepancy & Real-world Performance Comparison #84

doobidoo · 2025-11-05T13:21:13Z

doobidoo
Nov 5, 2025

TOON Format Library vs Specification: Token Efficiency Analysis

Hi TOON team! 👋

I recently conducted a comprehensive evaluation of TOON format for optimizing memory context injection in Claude Code session hooks. I wanted to share my findings as they revealed some interesting discrepancies between the library implementation and the specification, and might provide valuable feedback for the project.

Use Case Context

Project: MCP Memory Service - Claude Code integration
Goal: Optimize token usage when injecting 8 recent memories into Claude Code sessions
Baseline: Already-optimized markdown format (436 tokens)

Our session hooks inject project-relevant memories at the start of each Claude Code session. With context windows being precious, we're always looking for ways to reduce token consumption without sacrificing readability.

Test Results Summary

I tested three formats with identical data (8 memories with realistic content):

Format	Characters	Tokens (est.)	vs Baseline
Markdown (our baseline)	1,741	436	-
Manual TOON (spec-compliant tabular)	1,721	431	-1.1% ✅
Library TOON (`@toon-format/toon` v0.7.3)	2,406	602	+38% ❌

Key Findings

1. Library Output vs Specification

The @toon-format/toon library produces YAML-style output rather than the tabular format described in the specification:

Library Output (YAML-style):

memories:
  - content: "Critical analytics fix: Dashboard now shows accurate memory count..."
    tags:
      - dashboard
      - fix
      - v8.17.1
      - analytics
    created_at_iso: "2025-11-01T08:45:00Z"
    relevance: 0.95
    type: implementation
    age_days: 4
  - content: "Memory hook retrieves git context..."
    tags:
      - hooks
      - git
    ...

Expected Output (per spec):

memories[8]{content,tags,created_at_iso,relevance,type,age_days}:
  "Critical analytics fix...",["dashboard","fix","v8.17.1","analytics"],"2025-11-01T08:45:00Z",0.95,"implementation",4
  "Memory hook retrieves...",["hooks","git"],"2025-10-31T20:15:00Z",0.87,"feature",5
  ...

2. Manual Implementation Results

To verify whether the spec's claims about token efficiency were accurate, I manually implemented the tabular format described in the specification:

Implementation: ~180 lines with proper CSV escaping, compact date formatting, and field name deduplication

Results:

Manual tabular TOON: 431 tokens (1.1% better than markdown)
Library YAML-style TOON: 602 tokens (38% worse than markdown)
Difference: Manual implementation is 28.4% more efficient than library output

This confirms the TOON spec's approach has merit, but the library implementation doesn't deliver the promised token savings.

3. Real-world Comparison vs Optimized Markdown

Our markdown format is already highly optimized (v8.4.0+):

Adaptive truncation (300-800 chars based on memory count)
Compact date formatting ("Nov 1" vs ISO timestamps)
Content-first structure (no repeated field names)
Visual grouping by category

Finding: Manual TOON saves only 5 tokens (1.1%) vs our optimized markdown.

This suggests that for well-optimized baseline formats, TOON's tabular approach provides marginal gains - though every token counts!

Questions for the Maintainers

I'm genuinely curious about the design decisions here:

Why YAML-style instead of tabular?
- Is this intentional for readability/debugging?
- Does the library plan to support spec-compliant tabular output?
- Are there use cases where YAML-style is preferred?
Specification vs Implementation
- Is the spec outdated, or is the library diverging from it?
- Are there plans to align the library with the tabular format shown in docs?
Token Efficiency Claims
- The spec mentions 30-60% token savings - what baseline is this compared to?
- Have there been benchmarks against optimized JSON/YAML/Markdown?
Use Case Fit
- Is TOON optimized for specific data patterns (uniform arrays, short values)?
- Our memories have variable-length content (50-800 chars) - is this a mismatch?

What I Learned

Despite the token efficiency not meeting our needs, this was a valuable exercise:

TOON's tabular approach has merit - The spec-compliant format does reduce redundancy
Library implementations matter - There's a 38% difference between library and spec
Baseline optimization matters - Hard to beat already-optimized formats (diminishing returns)
Readability trade-offs - Dense tabular formats sacrifice debugability

Documentation

I've documented the full evaluation process with:

Complete token measurements and methodology
Code examples of both library and manual implementations
Detailed analysis of why improvements were minimal

PoC Results:

Library TOON Evaluation
Manual Tabular Implementation
Manual Implementation Code - Reference implementation with CSV escaping, compact dates, and helper functions
Test Scripts - test-toon-formatter.js and compare-formats.js with measurements

Constructive Feedback

I think TOON has interesting potential for specific use cases, especially:

Uniform data structures (database rows, API responses)
Short, predictable field values
Systems where every token truly matters
Non-human-readable contexts (pure machine consumption)

For our use case (human+AI collaboration contexts), the 1.1% savings didn't justify the readability trade-off, but I can see TOON shining in other scenarios.

Thank You!

Thanks for creating an interesting approach to token optimization! I hope this real-world evaluation provides useful data points for the project's development. Happy to discuss further or provide additional testing data if helpful.

Test Environment:

Library: @toon-format/[email protected]
Data: 8 memories with realistic technical content (50-300 chars each)
Baseline: Already-optimized markdown with adaptive truncation
Token Estimation: ~4 chars/token (standard approximation)
Use Case: Claude Code session hook memory injection

johannschopplich · 2025-11-10T16:53:31Z

johannschopplich
Nov 10, 2025
Maintainer

Hi there, @doobidoo!

Thanks a ton for the deep dive and for sharing code, measurements, and trade-offs. This is exactly the kind of real-world feedback that helps. 🙏

A couple of clarifications that might explain the discrepancy you're seeing:

Why the library didn't emit tabular rows

TOON's tabular format applies only when every field in each object is a primitive (string/number/bool/null). In your memories, tags is an array. That disqualifies the array from tabular encoding, so the encoder correctly falls back to list-of-objects form. The "manual tabular" you built places a JSON array inside a cell; that isn't spec-compliant because tabular cells must be primitives to preserve lossless round-trips.

Spec vs implementation

On this point the library matches the spec. If you did see arrays of primitives rendered as bullet lists instead of inline (tags[4]: …), that would be a bug – please share a repro. The intended encoding for primitive arrays is inline.

Token efficiency claims

The 30–60% savings are against JSON (pretty) on uniform, primitive tabular datasets. For mixed/nested structures or already hand-optimized text (like your markdown), savings shrink – sometimes to near zero – exactly as you observed. Tokenization also varies by model; using the model's tokenizer will give more reliable numbers than a 4 chars/token heuristic.

Fit for your use case

"You're absolute right" – as Claude would say. 😄 TOON shines with large, uniform, primitive tables. For content-heavy, human-facing summaries, your markdown is hard to beat. If you want to push TOON further here, two options:

Split tags into a separate table keyed by memory index/id:
- memories[8]{id,content,date,score,type,age}: …
- memTags[K]{id,tag}: …
Keep tags in-place but join them into a single primitive string using a non-colliding delimiter (often tab), acknowledging the semantic change.

In either approach, consider the tab delimiter for the main table to reduce quoting of commas in content.

If you can share a minimal JSON sample, I'm happy to run it through the reference encoder with different delimiter options and the model-specific tokenizer to see what the ceiling looks like for your data. And if you have a repro where arrays of primitives aren't encoded inline (tags[4]: …), I'll investigate that right away.

Really appreciate the thoughtful write-up – thanks again!

0 replies

doobidoo · 2025-11-10T18:27:03Z

doobidoo
Nov 10, 2025
Author

Thank you for the detailed clarification! This explains perfectly why the library fell back to list-of-objects format.

You're absolutely right about the primitive requirement—our tags are arrays, so tabular encoding isn't possible. For the manual implementation, I did place JSON arrays in the cells (acknowledging it's not spec-compliant), but even that only achieved 1.1% token savings vs our optimized markdown (431 vs 436 tokens for 8 memories).

Manual Tabular Results

I implemented a spec-inspired tabular formatter that outputs:

memories[8]{content,tags,date,score,type,age}:
  "Critical analytics fix...",["dashboard","fix","v8.17.1","analytics"],Nov 1,0.95,implementation,4

Token comparison:

Markdown: 436 tokens
Manual TOON: 431 tokens (1.1% better)
Library TOON: 602 tokens (38% worse)

The manual version is 28.4% more efficient than the library, confirming the tabular approach has merit, but vs our already-optimized markdown with adaptive truncation and compact formatting, it's marginal at best.

POC Documentation

I've documented both experiments extensively:

Library TOON PoC Results - Why it failed (38% worse)
Manual Tabular PoC Results - The 1.1% improvement
Manual Implementation Code - The formatter with CSV escaping

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Library vs Spec Implementation: 38% Token Discrepancy & Real-world Performance Comparison #84

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Library vs Spec Implementation: 38% Token Discrepancy & Real-world Performance Comparison #84

Uh oh!

Uh oh!

doobidoo Nov 5, 2025

TOON Format Library vs Specification: Token Efficiency Analysis

Use Case Context

Test Results Summary

Key Findings

1. Library Output vs Specification

2. Manual Implementation Results

3. Real-world Comparison vs Optimized Markdown

Questions for the Maintainers

What I Learned

Documentation

Constructive Feedback

Thank You!

Replies: 2 comments

Uh oh!

johannschopplich Nov 10, 2025 Maintainer

Why the library didn't emit tabular rows

Spec vs implementation

Token efficiency claims

Fit for your use case

Uh oh!

doobidoo Nov 10, 2025 Author

doobidoo
Nov 5, 2025

johannschopplich
Nov 10, 2025
Maintainer

doobidoo
Nov 10, 2025
Author