-
Notifications
You must be signed in to change notification settings - Fork 796
Description
Hey Team!
I have been following the development of TOON and recently published a preliminary article exploring the efficiency gains of this format compared to JSON. While my initial article touches on the subject, I believe TOON would benefit significantly from a rigorous, mathematically formal proof of its superiority in terms of information density and structural overhead.
The Proposal
I would like to propose (and contribute to) a formal mathematical framework that proves TOON is strictly more efficient than JSON for specific classes of data structures (particularly uniform arrays and nested objects).
Unlike current benchmarks that focus on token counts, which are highly dependent on the specific tokenizer, I propose strictly formalizing this comparison based on Character Count. Tokenization is variable; Character length is an absolute metric of data transport and storage efficiency. If we prove
Before I proceed with a full formal paper or a documentation contribution:
- Is this type of formalization something you would be interested in including in the official TOON repository (README or /docs)?
Looking forward to your thoughts !