v1.0.3

ronibhakta1 released this 28 Nov 22:45

· 37 commits to main since this release

597cf69

[1.0.3] - 2025-11-28

🎯 100% LLM Retrieval Accuracy Achieved

Major Achievement: ZON now achieves 100% LLM retrieval accuracy while maintaining superior token efficiency over TOON!

Changed

Explicit Sequential Columns: Disabled automatic sequential column omission ([id] notation)
- All columns now explicitly listed in table headers for better LLM comprehension
- Example: users:@(5):active,id,lastLogin,name,role (was users:@(5)[id]:active,lastLogin,name,role)
- Trade-off: +1.7% token increase for 100% LLM accuracy

Performance

LLM Accuracy: 100% (24/24 questions) vs TOON 100%, JSON 91.7%
Token Efficiency: 19,995 tokens (5.0% fewer than TOON's 20,988)
Overall Savings vs TOON: 4.6% (Claude) to 17.6% (GPT-4o)

Quality

✅ All unit tests pass (28/28)
✅ All roundtrip tests pass (27/27 datasets)
✅ No data loss or corruption
✅ Production ready

[1.0.3] - 2025-11-27

###ACHIEVEMENT: 8/8 Perfect Sweep vs All Competitors!

Breaking Changes:

Compact header syntax: @count: instead of @data(count):
Sequential ID auto-omission: [id] notation for 1..N sequences
Adaptive format selection based on data complexity

Added

Sparse Table Encoding: Automatically detects semi-uniform data and uses key:value notation for optional fields
Irregularity Score Calculation: Jaccard similarity-based scoring to choose optimal table format
Sequential Column Detection: Identifies and omits columns with sequential values (1, 2, 3, ..., N)
Smart Date Detection: ISO 8601 dates output unquoted for token efficiency
Context-Aware String Quoting: Only quotes strings when necessary to preserve type semantics

Performance

Total Tokens: 1,945 (down from 2,081 in v1.0.2)
-136 tokens saved (-6.5% improvement)
8/8 wins vs CSV (previously 4/8 tied)
8/8 wins vs TOON (-24.4% better)
-57.2% better than JSON formatted
-27.0% better than JSON compact

Benchmark Results (8 datasets)

Employees: 132 tokens (CSV: 138) - ZON WINS -4.3%
Time-Series: 245 tokens (CSV: 247) - ZON WINS -0.8%
GitHub Repos: 148 tokens (CSV: 164) - ZON WINS -9.8%
Event Logs: 220 tokens (CSV: 231) - ZON WINS -4.8% ← Sparse tables!
E-commerce: 193 tokens (CSV: 313) - ZON WINS -38.3%
Hike Data: 62 tokens (CSV: 85) - ZON WINS -27.1%
Deep Config: 111 tokens (CSV: 182) - ZON WINS -39.0%
Heavily Nested: 764 tokens (CSV: 1,044) - ZON WINS -26.8%

Competitive Analysis

vs CSV: -20.1% tokens overall
vs TOON: -24.4% tokens overall (beats on ALL datasets)
vs JSON: -57.2% formatted, -27.0% compact
Real Cost Savings: $4,890/month vs CSV at 1M API calls (GPT-4)

Fixed

Improved irregular schema detection to enable sparse tables for Event Logs
Enhanced sparse encoding threshold to support up to 5 optional columns
Better handling of undefined/null values in standard tables

Documentation

Added comprehensive competitive analysis vs TOON, CSV, JSON, YAML, XML
Documented sparse table encoding mechanism
Added real-world cost savings calculations
Updated benchmarks with CSV comparison

Assets 2