Skip to content

Commit b8f4fa6

Browse files
authored
Update README.md
1 parent 685c1ed commit b8f4fa6

File tree

1 file changed

+135
-142
lines changed

1 file changed

+135
-142
lines changed

zon-format/README.md

Lines changed: 135 additions & 142 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
[![PyPI downloads](https://img.shields.io/pypi/dm/zon-format?color=red)](https://pypi.org/project/zon-format/)
55
[![PyPI version](https://img.shields.io/pypi/v/zon-format.svg)](https://pypi.org/project/zon-format/)
66
[![Python](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
7-
[![Tests](https://img.shields.io/badge/tests-94%2F94%20passing-brightgreen.svg)](#quality--testing)
7+
[![Tests](https://img.shields.io/badge/tests-121%2F121%20passing-brightgreen.svg)](#quality--testing)
88
[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
99

1010
# ZON → JSON is dead. TOON was cute. ZON just won. (Now in Python)
@@ -45,154 +45,13 @@ pip install zon-format
4545

4646
## Why ZON?
4747

48-
### Yes, we actually ran the numbers (Dec 2025, fresh data)
49-
| Model | Dataset | ZON tokens | TOON | JSON | ZON vs TOON | ZON vs JSON |
50-
|---------------------|--------------------------|------------|--------|--------|-------------|-------------|
51-
| GPT-5-nano | Unified | **19,995** | 20,988 | 28,041 | **-5.0%** | **-28.6%** |
52-
| GPT-4o (o200k) | 50-level nested | **147,267**|225,510|285,131| **-34.7%** | **-48.3%** |
53-
| Claude 3.5 Sonnet | Mixed agent data | **149,281**|197,463|274,149| **-24.4%** | **-45.5%** |
54-
| Llama 3.1 405B | Everything | **234,623**|315,608|407,488| **-25.7%** | **-42.4%** |
55-
5648
AI is becoming cheaper and more accessible, but larger context windows allow for larger data inputs as well. **LLM tokens still cost money** – and standard JSON is verbose and token-expensive:
5749

5850
> "Dropped ZON into my LangChain agent loop and my monthly bill dropped $400 overnight"
5951
> — every Python dev who tried it this week
6052
6153
**ZON is the only format that wins (or ties for first) on every single LLM.**
6254

63-
```json
64-
{
65-
"context": {
66-
"task": "Our favorite hikes together",
67-
"location": "Boulder",
68-
"season": "spring_2025"
69-
},
70-
"friends": ["ana", "luis", "sam"],
71-
"hikes": [
72-
{
73-
"id": 1,
74-
"name": "Blue Lake Trail",
75-
"distanceKm": 7.5,
76-
"elevationGain": 320,
77-
"companion": "ana",
78-
"wasSunny": true
79-
},
80-
{
81-
"id": 2,
82-
"name": "Ridge Overlook",
83-
"distanceKm": 9.2,
84-
"elevationGain": 540,
85-
"companion": "luis",
86-
"wasSunny": false
87-
},
88-
{
89-
"id": 3,
90-
"name": "Wildflower Loop",
91-
"distanceKm": 5.1,
92-
"elevationGain": 180,
93-
"companion": "sam",
94-
"wasSunny": true
95-
}
96-
]
97-
}
98-
```
99-
100-
<details>
101-
<summary>TOON already conveys the same information with <strong>fewer tokens</strong>.</summary>
102-
103-
```yaml
104-
context:
105-
task: Our favorite hikes together
106-
location: Boulder
107-
season: spring_2025
108-
friends[3]: ana,luis,sam
109-
hikes[3]{id,name,distanceKm,elevationGain,companion,wasSunny}:
110-
1,Blue Lake Trail,7.5,320,ana,true
111-
2,Ridge Overlook,9.2,540,luis,false
112-
3,Wildflower Loop,5.1,180,sam,true
113-
```
114-
115-
</details>
116-
117-
ZON conveys the same information with **even fewer tokens** than TOON – using compact table format with explicit headers:
118-
119-
```
120-
context.task:Our favorite hikes together
121-
context.location:Boulder
122-
context.season:spring_2025
123-
friends:ana,luis,sam
124-
hikes:@(3):companion,distanceKm,elevationGain,id,name,wasSunny
125-
ana,7.5,320,1,Blue Lake Trail,T
126-
luis,9.2,540,2,Ridge Overlook,F
127-
sam,5.1,180,3,Wildflower Loop,T
128-
```
129-
130-
### 🛡️ Validation + 📉 Compression
131-
132-
Building reliable LLM apps requires two things:
133-
1. **Safety:** You need to validate outputs (like you do with Zod/Pydantic).
134-
2. **Efficiency:** You need to compress inputs to save money.
135-
136-
ZON is the only library that gives you **both in one package**.
137-
138-
| Feature | Traditional Validation (e.g. Pydantic) | ZON |
139-
| :--- | :--- | :--- |
140-
| **Type Safety** | ✅ Yes | ✅ Yes |
141-
| **Runtime Validation** | ✅ Yes | ✅ Yes |
142-
| **Input Compression** | ❌ No |**Yes (Saves ~50%)** |
143-
| **Prompt Generation** | ❌ Plugins needed |**Built-in** |
144-
| **Bundle Size** | ~Large |**~5kb** |
145-
146-
**The Sweet Spot:** Use ZON to **save money on Input Tokens** while keeping the strict safety you expect.
147-
148-
---
149-
150-
## Key Features
151-
152-
- 🎯 **100% LLM Accuracy**: Achieves perfect retrieval (24/24 questions) with self-explanatory structure – no hints needed
153-
154-
### 3. Smart Flattening (Dot Notation)
155-
ZON automatically flattens top-level nested objects to reduce indentation.
156-
**JSON:**
157-
```json
158-
{
159-
"config": {
160-
"database": {
161-
"host": "localhost"
162-
}
163-
}
164-
}
165-
```
166-
**ZON:**
167-
```
168-
config.database{host:localhost}
169-
```
170-
171-
### 4. Colon-less Structure
172-
For nested objects and arrays, ZON omits the redundant colon, creating a cleaner, block-like structure.
173-
**JSON:**
174-
```json
175-
{
176-
"user": {
177-
"name": "Alice",
178-
"roles": ["admin", "dev"]
179-
}
180-
}
181-
```
182-
**ZON:**
183-
```
184-
user{name:Alice,roles[admin,dev]}
185-
```
186-
(Note: `user{...}` instead of `user:{...}`)
187-
- 💾 **Most Token-Efficient**: 4-15% fewer tokens than TOON across all tokenizers
188-
- 🎯 **JSON Data Model**: Encodes the same objects, arrays, and primitives as JSON with deterministic, lossless round-trips
189-
- 📐 **Minimal Syntax**: Explicit headers (`@(N)` for count, column list) eliminate ambiguity for LLMs
190-
- 🧺 **Tabular Arrays**: Uniform arrays collapse into tables that declare fields once and stream row values
191-
- 🔢 **Canonical Numbers**: No scientific notation (1000000, not 1e6), NaN/Infinity → null
192-
- 🌳 **Deep Nesting**: Handles complex nested structures efficiently (91% compression on 50-level deep objects)
193-
- 🔒 **Security Limits**: Automatic DOS prevention (100MB docs, 1M arrays, 100K keys)
194-
-**Production Ready**: 94/94 tests pass, 27/27 datasets verified, zero data loss
195-
19655
---
19756

19857
## Benchmarks
@@ -394,6 +253,140 @@ Llama 3 (Meta):
394253

395254
---
396255

256+
```json
257+
{
258+
"context": {
259+
"task": "Our favorite hikes together",
260+
"location": "Boulder",
261+
"season": "spring_2025"
262+
},
263+
"friends": ["ana", "luis", "sam"],
264+
"hikes": [
265+
{
266+
"id": 1,
267+
"name": "Blue Lake Trail",
268+
"distanceKm": 7.5,
269+
"elevationGain": 320,
270+
"companion": "ana",
271+
"wasSunny": true
272+
},
273+
{
274+
"id": 2,
275+
"name": "Ridge Overlook",
276+
"distanceKm": 9.2,
277+
"elevationGain": 540,
278+
"companion": "luis",
279+
"wasSunny": false
280+
},
281+
{
282+
"id": 3,
283+
"name": "Wildflower Loop",
284+
"distanceKm": 5.1,
285+
"elevationGain": 180,
286+
"companion": "sam",
287+
"wasSunny": true
288+
}
289+
]
290+
}
291+
```
292+
293+
<details>
294+
<summary>TOON already conveys the same information with <strong>fewer tokens</strong>.</summary>
295+
296+
```yaml
297+
context:
298+
task: Our favorite hikes together
299+
location: Boulder
300+
season: spring_2025
301+
friends[3]: ana,luis,sam
302+
hikes[3]{id,name,distanceKm,elevationGain,companion,wasSunny}:
303+
1,Blue Lake Trail,7.5,320,ana,true
304+
2,Ridge Overlook,9.2,540,luis,false
305+
3,Wildflower Loop,5.1,180,sam,true
306+
```
307+
308+
</details>
309+
310+
ZON conveys the same information with **even fewer tokens** than TOON – using compact table format with explicit headers:
311+
312+
```
313+
context.task:Our favorite hikes together
314+
context.location:Boulder
315+
context.season:spring_2025
316+
friends:ana,luis,sam
317+
hikes:@(3):companion,distanceKm,elevationGain,id,name,wasSunny
318+
ana,7.5,320,1,Blue Lake Trail,T
319+
luis,9.2,540,2,Ridge Overlook,F
320+
sam,5.1,180,3,Wildflower Loop,T
321+
```
322+
323+
### 🛡️ Validation + 📉 Compression
324+
325+
Building reliable LLM apps requires two things:
326+
1. **Safety:** You need to validate outputs (like you do with Zod/Pydantic).
327+
2. **Efficiency:** You need to compress inputs to save money.
328+
329+
ZON is the only library that gives you **both in one package**.
330+
331+
| Feature | Traditional Validation (e.g. Pydantic) | ZON |
332+
| :--- | :--- | :--- |
333+
| **Type Safety** | ✅ Yes | ✅ Yes |
334+
| **Runtime Validation** | ✅ Yes | ✅ Yes |
335+
| **Input Compression** | ❌ No |**Yes (Saves ~50%)** |
336+
| **Prompt Generation** | ❌ Plugins needed |**Built-in** |
337+
| **Bundle Size** | ~Large |**~5kb** |
338+
339+
**The Sweet Spot:** Use ZON to **save money on Input Tokens** while keeping the strict safety you expect.
340+
341+
---
342+
343+
## Key Features
344+
345+
- 🎯 **100% LLM Accuracy**: Achieves perfect retrieval (24/24 questions) with self-explanatory structure – no hints needed
346+
347+
### 3. Smart Flattening (Dot Notation)
348+
ZON automatically flattens top-level nested objects to reduce indentation.
349+
**JSON:**
350+
```json
351+
{
352+
"config": {
353+
"database": {
354+
"host": "localhost"
355+
}
356+
}
357+
}
358+
```
359+
**ZON:**
360+
```
361+
config.database{host:localhost}
362+
```
363+
364+
### 4. Colon-less Structure
365+
For nested objects and arrays, ZON omits the redundant colon, creating a cleaner, block-like structure.
366+
**JSON:**
367+
```json
368+
{
369+
"user": {
370+
"name": "Alice",
371+
"roles": ["admin", "dev"]
372+
}
373+
}
374+
```
375+
**ZON:**
376+
```
377+
user{name:Alice,roles[admin,dev]}
378+
```
379+
(Note: `user{...}` instead of `user:{...}`)
380+
- 💾 **Most Token-Efficient**: 4-15% fewer tokens than TOON across all tokenizers
381+
- 🎯 **JSON Data Model**: Encodes the same objects, arrays, and primitives as JSON with deterministic, lossless round-trips
382+
- 📐 **Minimal Syntax**: Explicit headers (`@(N)` for count, column list) eliminate ambiguity for LLMs
383+
- 🧺 **Tabular Arrays**: Uniform arrays collapse into tables that declare fields once and stream row values
384+
- 🔢 **Canonical Numbers**: No scientific notation (1000000, not 1e6), NaN/Infinity → null
385+
- 🌳 **Deep Nesting**: Handles complex nested structures efficiently (91% compression on 50-level deep objects)
386+
- 🔒 **Security Limits**: Automatic DOS prevention (100MB docs, 1M arrays, 100K keys)
387+
-**Production Ready**: 94/94 tests pass, 27/27 datasets verified, zero data loss
388+
389+
397390
## Security & Data Types
398391

399392
### Eval-Safe Design

0 commit comments

Comments
 (0)