Replies: 2 comments
-
|
Hi there, @doobidoo! Thanks a ton for the deep dive and for sharing code, measurements, and trade-offs. This is exactly the kind of real-world feedback that helps. 🙏 A couple of clarifications that might explain the discrepancy you're seeing: Why the library didn't emit tabular rowsTOON's tabular format applies only when every field in each object is a primitive (string/number/bool/null). In your memories, tags is an array. That disqualifies the array from tabular encoding, so the encoder correctly falls back to list-of-objects form. The "manual tabular" you built places a JSON array inside a cell; that isn't spec-compliant because tabular cells must be primitives to preserve lossless round-trips. Spec vs implementationOn this point the library matches the spec. If you did see arrays of primitives rendered as bullet lists instead of inline (tags[4]: …), that would be a bug – please share a repro. The intended encoding for primitive arrays is inline. Token efficiency claimsThe 30–60% savings are against JSON (pretty) on uniform, primitive tabular datasets. For mixed/nested structures or already hand-optimized text (like your markdown), savings shrink – sometimes to near zero – exactly as you observed. Tokenization also varies by model; using the model's tokenizer will give more reliable numbers than a 4 chars/token heuristic. Fit for your use case"You're absolute right" – as Claude would say. 😄 TOON shines with large, uniform, primitive tables. For content-heavy, human-facing summaries, your markdown is hard to beat. If you want to push TOON further here, two options:
In either approach, consider the tab delimiter for the main table to reduce quoting of commas in content. If you can share a minimal JSON sample, I'm happy to run it through the reference encoder with different delimiter options and the model-specific tokenizer to see what the ceiling looks like for your data. And if you have a repro where arrays of primitives aren't encoded inline (tags[4]: …), I'll investigate that right away. Really appreciate the thoughtful write-up – thanks again! |
Beta Was this translation helpful? Give feedback.
-
|
Thank you for the detailed clarification! This explains perfectly why the library fell back to list-of-objects format. You're absolutely right about the primitive requirement—our tags are arrays, so tabular encoding isn't possible. For the manual implementation, I did place JSON arrays in the cells (acknowledging it's not spec-compliant), but even that only achieved 1.1% token savings vs our optimized markdown (431 vs 436 tokens for 8 memories). Manual Tabular Results I implemented a spec-inspired tabular formatter that outputs: Token comparison: Markdown: 436 tokens The manual version is 28.4% more efficient than the library, confirming the tabular approach has merit, but vs our already-optimized markdown with adaptive truncation and compact formatting, it's marginal at best. POC Documentation I've documented both experiments extensively: Library TOON PoC Results - Why it failed (38% worse) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
TOON Format Library vs Specification: Token Efficiency Analysis
Hi TOON team! 👋
I recently conducted a comprehensive evaluation of TOON format for optimizing memory context injection in Claude Code session hooks. I wanted to share my findings as they revealed some interesting discrepancies between the library implementation and the specification, and might provide valuable feedback for the project.
Use Case Context
Project: MCP Memory Service - Claude Code integration
Goal: Optimize token usage when injecting 8 recent memories into Claude Code sessions
Baseline: Already-optimized markdown format (436 tokens)
Our session hooks inject project-relevant memories at the start of each Claude Code session. With context windows being precious, we're always looking for ways to reduce token consumption without sacrificing readability.
Test Results Summary
I tested three formats with identical data (8 memories with realistic content):
@toon-format/toonv0.7.3)Key Findings
1. Library Output vs Specification
The
@toon-format/toonlibrary produces YAML-style output rather than the tabular format described in the specification:Library Output (YAML-style):
Expected Output (per spec):
2. Manual Implementation Results
To verify whether the spec's claims about token efficiency were accurate, I manually implemented the tabular format described in the specification:
Implementation: ~180 lines with proper CSV escaping, compact date formatting, and field name deduplication
Results:
This confirms the TOON spec's approach has merit, but the library implementation doesn't deliver the promised token savings.
3. Real-world Comparison vs Optimized Markdown
Our markdown format is already highly optimized (v8.4.0+):
Finding: Manual TOON saves only 5 tokens (1.1%) vs our optimized markdown.
This suggests that for well-optimized baseline formats, TOON's tabular approach provides marginal gains - though every token counts!
Questions for the Maintainers
I'm genuinely curious about the design decisions here:
Why YAML-style instead of tabular?
Specification vs Implementation
Token Efficiency Claims
Use Case Fit
What I Learned
Despite the token efficiency not meeting our needs, this was a valuable exercise:
Documentation
I've documented the full evaluation process with:
PoC Results:
test-toon-formatter.jsandcompare-formats.jswith measurementsConstructive Feedback
I think TOON has interesting potential for specific use cases, especially:
For our use case (human+AI collaboration contexts), the 1.1% savings didn't justify the readability trade-off, but I can see TOON shining in other scenarios.
Thank You!
Thanks for creating an interesting approach to token optimization! I hope this real-world evaluation provides useful data points for the project's development. Happy to discuss further or provide additional testing data if helpful.
Test Environment:
@toon-format/[email protected]Beta Was this translation helpful? Give feedback.
All reactions