Skip to content

[Proposal] Mathematical Formalization of TOON's Efficiency vs JSON (Character-based Approach) #187

@mateolafalce

Description

@mateolafalce

Hey Team!

I have been following the development of TOON and recently published a preliminary article exploring the efficiency gains of this format compared to JSON. While my initial article touches on the subject, I believe TOON would benefit significantly from a rigorous, mathematically formal proof of its superiority in terms of information density and structural overhead.

The Proposal

I would like to propose (and contribute to) a formal mathematical framework that proves TOON is strictly more efficient than JSON for specific classes of data structures (particularly uniform arrays and nested objects).

Unlike current benchmarks that focus on token counts, which are highly dependent on the specific tokenizer, I propose strictly formalizing this comparison based on Character Count. Tokenization is variable; Character length is an absolute metric of data transport and storage efficiency. If we prove $L_{char}(TOON) < L_{char}(JSON)$, the token savings naturally follow as a corollary, but the proof remains tokenizer-agnostic.

Before I proceed with a full formal paper or a documentation contribution:

  • Is this type of formalization something you would be interested in including in the official TOON repository (README or /docs)?

Looking forward to your thoughts !

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions