Skip to content

Conversation

@federicoart
Copy link

This PR introduces a new auto delimiter mode to the TOON encoder.
When enabled, the encoder analyzes the content of arrays and selects the delimiter that produces the least collisions, reducing the need for quoting and improving readability.

What this feature does

Automatically chooses the safest delimiter for:

inline primitive arrays

arrays of arrays (list-item formatting)

tabular arrays (uniform objects)

Scores potential delimiters (tab, pipe, comma) based on how often they appear in the data.

Picks the delimiter with lowest collision count, using this priority:

\t (tab)

| (pipe)

, (comma)

Key changes

CLI updated to accept --delimiter auto.

New option field: delimiterStrategy: 'fixed' | 'auto'.

Added helper functions to:

collect relevant strings

count delimiter collisions

perform automatic selection

Encoder logic updated to use the resolved delimiter per array type.

Unit tests added for the new behavior.

Documentation updated to mention the new flag and behavior.

Why this matters

Avoids unnecessary quoting in TOON output.

Produces cleaner, smaller encoded files.

Handles real-world data gracefully (tags, names, CSV-like strings).

Fully backward-compatible: existing workflows using fixed delimiters are unaffected.

Comment on lines +442 to +452
function countDelimiterCollisions(strings: readonly string[], delimiter: Delimiter): number {
let collisions = 0

for (const value of strings) {
if (value.includes(delimiter)) {
collisions++
}
}

return collisions
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simplify this part by using filter to count the items that contain the delimiter.

Suggested change
function countDelimiterCollisions(strings: readonly string[], delimiter: Delimiter): number {
let collisions = 0
for (const value of strings) {
if (value.includes(delimiter)) {
collisions++
}
}
return collisions
}
function countDelimiterCollisions(strings: readonly string[], delimiter: Delimiter): number {
return strings.filter(s => s.includes(delimiter)).length;
}

@johannschopplich
Copy link
Collaborator

Hey there, love the idea so far! I haven't had a change to dig in deeper. Just wanted to leave a comment for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants