|
4 | 4 | [](https://pypi.org/project/zon-format/) |
5 | 5 | [](https://pypi.org/project/zon-format/) |
6 | 6 | [](https://www.python.org/downloads/) |
7 | | -[](#quality--testing) |
| 7 | +[](#quality--testing) |
8 | 8 | [](LICENSE) |
9 | 9 |
|
10 | 10 | # ZON → JSON is dead. TOON was cute. ZON just won. (Now in Python) |
@@ -45,154 +45,13 @@ pip install zon-format |
45 | 45 |
|
46 | 46 | ## Why ZON? |
47 | 47 |
|
48 | | -### Yes, we actually ran the numbers (Dec 2025, fresh data) |
49 | | -| Model | Dataset | ZON tokens | TOON | JSON | ZON vs TOON | ZON vs JSON | |
50 | | -|---------------------|--------------------------|------------|--------|--------|-------------|-------------| |
51 | | -| GPT-5-nano | Unified | **19,995** | 20,988 | 28,041 | **-5.0%** | **-28.6%** | |
52 | | -| GPT-4o (o200k) | 50-level nested | **147,267**|225,510|285,131| **-34.7%** | **-48.3%** | |
53 | | -| Claude 3.5 Sonnet | Mixed agent data | **149,281**|197,463|274,149| **-24.4%** | **-45.5%** | |
54 | | -| Llama 3.1 405B | Everything | **234,623**|315,608|407,488| **-25.7%** | **-42.4%** | |
55 | | - |
56 | 48 | AI is becoming cheaper and more accessible, but larger context windows allow for larger data inputs as well. **LLM tokens still cost money** – and standard JSON is verbose and token-expensive: |
57 | 49 |
|
58 | 50 | > "Dropped ZON into my LangChain agent loop and my monthly bill dropped $400 overnight" |
59 | 51 | > — every Python dev who tried it this week |
60 | 52 |
|
61 | 53 | **ZON is the only format that wins (or ties for first) on every single LLM.** |
62 | 54 |
|
63 | | -```json |
64 | | -{ |
65 | | - "context": { |
66 | | - "task": "Our favorite hikes together", |
67 | | - "location": "Boulder", |
68 | | - "season": "spring_2025" |
69 | | - }, |
70 | | - "friends": ["ana", "luis", "sam"], |
71 | | - "hikes": [ |
72 | | - { |
73 | | - "id": 1, |
74 | | - "name": "Blue Lake Trail", |
75 | | - "distanceKm": 7.5, |
76 | | - "elevationGain": 320, |
77 | | - "companion": "ana", |
78 | | - "wasSunny": true |
79 | | - }, |
80 | | - { |
81 | | - "id": 2, |
82 | | - "name": "Ridge Overlook", |
83 | | - "distanceKm": 9.2, |
84 | | - "elevationGain": 540, |
85 | | - "companion": "luis", |
86 | | - "wasSunny": false |
87 | | - }, |
88 | | - { |
89 | | - "id": 3, |
90 | | - "name": "Wildflower Loop", |
91 | | - "distanceKm": 5.1, |
92 | | - "elevationGain": 180, |
93 | | - "companion": "sam", |
94 | | - "wasSunny": true |
95 | | - } |
96 | | - ] |
97 | | -} |
98 | | -``` |
99 | | - |
100 | | -<details> |
101 | | -<summary>TOON already conveys the same information with <strong>fewer tokens</strong>.</summary> |
102 | | - |
103 | | -```yaml |
104 | | -context: |
105 | | - task: Our favorite hikes together |
106 | | - location: Boulder |
107 | | - season: spring_2025 |
108 | | -friends[3]: ana,luis,sam |
109 | | -hikes[3]{id,name,distanceKm,elevationGain,companion,wasSunny}: |
110 | | - 1,Blue Lake Trail,7.5,320,ana,true |
111 | | - 2,Ridge Overlook,9.2,540,luis,false |
112 | | - 3,Wildflower Loop,5.1,180,sam,true |
113 | | -``` |
114 | | -
|
115 | | -</details> |
116 | | -
|
117 | | -ZON conveys the same information with **even fewer tokens** than TOON – using compact table format with explicit headers: |
118 | | -
|
119 | | -``` |
120 | | -context.task:Our favorite hikes together |
121 | | -context.location:Boulder |
122 | | -context.season:spring_2025 |
123 | | -friends:ana,luis,sam |
124 | | -hikes:@(3):companion,distanceKm,elevationGain,id,name,wasSunny |
125 | | -ana,7.5,320,1,Blue Lake Trail,T |
126 | | -luis,9.2,540,2,Ridge Overlook,F |
127 | | -sam,5.1,180,3,Wildflower Loop,T |
128 | | -``` |
129 | | - |
130 | | -### 🛡️ Validation + 📉 Compression |
131 | | - |
132 | | -Building reliable LLM apps requires two things: |
133 | | -1. **Safety:** You need to validate outputs (like you do with Zod/Pydantic). |
134 | | -2. **Efficiency:** You need to compress inputs to save money. |
135 | | - |
136 | | -ZON is the only library that gives you **both in one package**. |
137 | | - |
138 | | -| Feature | Traditional Validation (e.g. Pydantic) | ZON | |
139 | | -| :--- | :--- | :--- | |
140 | | -| **Type Safety** | ✅ Yes | ✅ Yes | |
141 | | -| **Runtime Validation** | ✅ Yes | ✅ Yes | |
142 | | -| **Input Compression** | ❌ No | ✅ **Yes (Saves ~50%)** | |
143 | | -| **Prompt Generation** | ❌ Plugins needed | ✅ **Built-in** | |
144 | | -| **Bundle Size** | ~Large | ⚡ **~5kb** | |
145 | | - |
146 | | -**The Sweet Spot:** Use ZON to **save money on Input Tokens** while keeping the strict safety you expect. |
147 | | - |
148 | | ---- |
149 | | - |
150 | | -## Key Features |
151 | | - |
152 | | -- 🎯 **100% LLM Accuracy**: Achieves perfect retrieval (24/24 questions) with self-explanatory structure – no hints needed |
153 | | - |
154 | | -### 3. Smart Flattening (Dot Notation) |
155 | | -ZON automatically flattens top-level nested objects to reduce indentation. |
156 | | -**JSON:** |
157 | | -```json |
158 | | -{ |
159 | | - "config": { |
160 | | - "database": { |
161 | | - "host": "localhost" |
162 | | - } |
163 | | - } |
164 | | -} |
165 | | -``` |
166 | | -**ZON:** |
167 | | -``` |
168 | | -config.database{host:localhost} |
169 | | -``` |
170 | | - |
171 | | -### 4. Colon-less Structure |
172 | | -For nested objects and arrays, ZON omits the redundant colon, creating a cleaner, block-like structure. |
173 | | -**JSON:** |
174 | | -```json |
175 | | -{ |
176 | | - "user": { |
177 | | - "name": "Alice", |
178 | | - "roles": ["admin", "dev"] |
179 | | - } |
180 | | -} |
181 | | -``` |
182 | | -**ZON:** |
183 | | -``` |
184 | | -user{name:Alice,roles[admin,dev]} |
185 | | -``` |
186 | | -(Note: `user{...}` instead of `user:{...}`) |
187 | | -- 💾 **Most Token-Efficient**: 4-15% fewer tokens than TOON across all tokenizers |
188 | | -- 🎯 **JSON Data Model**: Encodes the same objects, arrays, and primitives as JSON with deterministic, lossless round-trips |
189 | | -- 📐 **Minimal Syntax**: Explicit headers (`@(N)` for count, column list) eliminate ambiguity for LLMs |
190 | | -- 🧺 **Tabular Arrays**: Uniform arrays collapse into tables that declare fields once and stream row values |
191 | | -- 🔢 **Canonical Numbers**: No scientific notation (1000000, not 1e6), NaN/Infinity → null |
192 | | -- 🌳 **Deep Nesting**: Handles complex nested structures efficiently (91% compression on 50-level deep objects) |
193 | | -- 🔒 **Security Limits**: Automatic DOS prevention (100MB docs, 1M arrays, 100K keys) |
194 | | -- ✅ **Production Ready**: 94/94 tests pass, 27/27 datasets verified, zero data loss |
195 | | - |
196 | 55 | --- |
197 | 56 |
|
198 | 57 | ## Benchmarks |
@@ -394,6 +253,140 @@ Llama 3 (Meta): |
394 | 253 |
|
395 | 254 | --- |
396 | 255 |
|
| 256 | +```json |
| 257 | +{ |
| 258 | + "context": { |
| 259 | + "task": "Our favorite hikes together", |
| 260 | + "location": "Boulder", |
| 261 | + "season": "spring_2025" |
| 262 | + }, |
| 263 | + "friends": ["ana", "luis", "sam"], |
| 264 | + "hikes": [ |
| 265 | + { |
| 266 | + "id": 1, |
| 267 | + "name": "Blue Lake Trail", |
| 268 | + "distanceKm": 7.5, |
| 269 | + "elevationGain": 320, |
| 270 | + "companion": "ana", |
| 271 | + "wasSunny": true |
| 272 | + }, |
| 273 | + { |
| 274 | + "id": 2, |
| 275 | + "name": "Ridge Overlook", |
| 276 | + "distanceKm": 9.2, |
| 277 | + "elevationGain": 540, |
| 278 | + "companion": "luis", |
| 279 | + "wasSunny": false |
| 280 | + }, |
| 281 | + { |
| 282 | + "id": 3, |
| 283 | + "name": "Wildflower Loop", |
| 284 | + "distanceKm": 5.1, |
| 285 | + "elevationGain": 180, |
| 286 | + "companion": "sam", |
| 287 | + "wasSunny": true |
| 288 | + } |
| 289 | + ] |
| 290 | +} |
| 291 | +``` |
| 292 | + |
| 293 | +<details> |
| 294 | +<summary>TOON already conveys the same information with <strong>fewer tokens</strong>.</summary> |
| 295 | + |
| 296 | +```yaml |
| 297 | +context: |
| 298 | + task: Our favorite hikes together |
| 299 | + location: Boulder |
| 300 | + season: spring_2025 |
| 301 | +friends[3]: ana,luis,sam |
| 302 | +hikes[3]{id,name,distanceKm,elevationGain,companion,wasSunny}: |
| 303 | + 1,Blue Lake Trail,7.5,320,ana,true |
| 304 | + 2,Ridge Overlook,9.2,540,luis,false |
| 305 | + 3,Wildflower Loop,5.1,180,sam,true |
| 306 | +``` |
| 307 | +
|
| 308 | +</details> |
| 309 | +
|
| 310 | +ZON conveys the same information with **even fewer tokens** than TOON – using compact table format with explicit headers: |
| 311 | +
|
| 312 | +``` |
| 313 | +context.task:Our favorite hikes together |
| 314 | +context.location:Boulder |
| 315 | +context.season:spring_2025 |
| 316 | +friends:ana,luis,sam |
| 317 | +hikes:@(3):companion,distanceKm,elevationGain,id,name,wasSunny |
| 318 | +ana,7.5,320,1,Blue Lake Trail,T |
| 319 | +luis,9.2,540,2,Ridge Overlook,F |
| 320 | +sam,5.1,180,3,Wildflower Loop,T |
| 321 | +``` |
| 322 | + |
| 323 | +### 🛡️ Validation + 📉 Compression |
| 324 | + |
| 325 | +Building reliable LLM apps requires two things: |
| 326 | +1. **Safety:** You need to validate outputs (like you do with Zod/Pydantic). |
| 327 | +2. **Efficiency:** You need to compress inputs to save money. |
| 328 | + |
| 329 | +ZON is the only library that gives you **both in one package**. |
| 330 | + |
| 331 | +| Feature | Traditional Validation (e.g. Pydantic) | ZON | |
| 332 | +| :--- | :--- | :--- | |
| 333 | +| **Type Safety** | ✅ Yes | ✅ Yes | |
| 334 | +| **Runtime Validation** | ✅ Yes | ✅ Yes | |
| 335 | +| **Input Compression** | ❌ No | ✅ **Yes (Saves ~50%)** | |
| 336 | +| **Prompt Generation** | ❌ Plugins needed | ✅ **Built-in** | |
| 337 | +| **Bundle Size** | ~Large | ⚡ **~5kb** | |
| 338 | + |
| 339 | +**The Sweet Spot:** Use ZON to **save money on Input Tokens** while keeping the strict safety you expect. |
| 340 | + |
| 341 | +--- |
| 342 | + |
| 343 | +## Key Features |
| 344 | + |
| 345 | +- 🎯 **100% LLM Accuracy**: Achieves perfect retrieval (24/24 questions) with self-explanatory structure – no hints needed |
| 346 | + |
| 347 | +### 3. Smart Flattening (Dot Notation) |
| 348 | +ZON automatically flattens top-level nested objects to reduce indentation. |
| 349 | +**JSON:** |
| 350 | +```json |
| 351 | +{ |
| 352 | + "config": { |
| 353 | + "database": { |
| 354 | + "host": "localhost" |
| 355 | + } |
| 356 | + } |
| 357 | +} |
| 358 | +``` |
| 359 | +**ZON:** |
| 360 | +``` |
| 361 | +config.database{host:localhost} |
| 362 | +``` |
| 363 | + |
| 364 | +### 4. Colon-less Structure |
| 365 | +For nested objects and arrays, ZON omits the redundant colon, creating a cleaner, block-like structure. |
| 366 | +**JSON:** |
| 367 | +```json |
| 368 | +{ |
| 369 | + "user": { |
| 370 | + "name": "Alice", |
| 371 | + "roles": ["admin", "dev"] |
| 372 | + } |
| 373 | +} |
| 374 | +``` |
| 375 | +**ZON:** |
| 376 | +``` |
| 377 | +user{name:Alice,roles[admin,dev]} |
| 378 | +``` |
| 379 | +(Note: `user{...}` instead of `user:{...}`) |
| 380 | +- 💾 **Most Token-Efficient**: 4-15% fewer tokens than TOON across all tokenizers |
| 381 | +- 🎯 **JSON Data Model**: Encodes the same objects, arrays, and primitives as JSON with deterministic, lossless round-trips |
| 382 | +- 📐 **Minimal Syntax**: Explicit headers (`@(N)` for count, column list) eliminate ambiguity for LLMs |
| 383 | +- 🧺 **Tabular Arrays**: Uniform arrays collapse into tables that declare fields once and stream row values |
| 384 | +- 🔢 **Canonical Numbers**: No scientific notation (1000000, not 1e6), NaN/Infinity → null |
| 385 | +- 🌳 **Deep Nesting**: Handles complex nested structures efficiently (91% compression on 50-level deep objects) |
| 386 | +- 🔒 **Security Limits**: Automatic DOS prevention (100MB docs, 1M arrays, 100K keys) |
| 387 | +- ✅ **Production Ready**: 94/94 tests pass, 27/27 datasets verified, zero data loss |
| 388 | + |
| 389 | + |
397 | 390 | ## Security & Data Types |
398 | 391 |
|
399 | 392 | ### Eval-Safe Design |
|
0 commit comments