Add Automatic Delimiter Selection (auto) to TOON Encoder #134
+204
−32
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces a new auto delimiter mode to the TOON encoder.
When enabled, the encoder analyzes the content of arrays and selects the delimiter that produces the least collisions, reducing the need for quoting and improving readability.
What this feature does
Automatically chooses the safest delimiter for:
inline primitive arrays
arrays of arrays (list-item formatting)
tabular arrays (uniform objects)
Scores potential delimiters (tab, pipe, comma) based on how often they appear in the data.
Picks the delimiter with lowest collision count, using this priority:
\t (tab)
| (pipe)
, (comma)
Key changes
CLI updated to accept --delimiter auto.
New option field: delimiterStrategy: 'fixed' | 'auto'.
Added helper functions to:
collect relevant strings
count delimiter collisions
perform automatic selection
Encoder logic updated to use the resolved delimiter per array type.
Unit tests added for the new behavior.
Documentation updated to mention the new flag and behavior.
Why this matters
Avoids unnecessary quoting in TOON output.
Produces cleaner, smaller encoded files.
Handles real-world data gracefully (tags, names, CSV-like strings).
Fully backward-compatible: existing workflows using fixed delimiters are unaffected.