fix(normalise): sort map keys before joining for deterministic CanonicalText #49
fix(normalise): sort map keys before joining for deterministic CanonicalText #49RachanaB5 wants to merge 1 commit intoc2siorg:mainfrom
CanonicalText #49Conversation
…calText
payloadText iterated map[string]any in random Go map order, producing
non-deterministic CanonicalText for on_tool_call and on_context payloads.
The same payload could produce different field orderings across runs,
causing flaky Aho-Corasick matches when a jailbreak pattern spans a
field boundary.
docs/pipeline.md shows a defined concatenation order:
{"name": "shell", "args": "rm -rf /"} → "shell rm -rf /"
Fix: collect map keys, sort with sort.Strings, then iterate in sorted
order before joining values. CanonicalText is now stable and reproducible
for the same input on every run.
Adds:
- TestNormalise_MapPayloadDeterministic: 10 runs produce identical output
- TestNormalise_MapPayloadSortedKeyOrder: asserts exact sorted key order
|
Nice catch on normalising the map ordering - makes sense for consistency. Could this affect how patterns match in the scan stage? For tool call payloads specifically, a fixed field priority might be more predictable - something like: fieldOrder := []string{"name", "tool_name", "query", "args", "params"}Then fall back to alphabetical for any keys not in the priority list. That way tool name always leads and the canonical text reads more naturally. |
|
Good point on the semantic ordering, that’s a valid concern. From the current pipeline design, normalise is intended to be a field-agnostic transform, and Aho-Corasick matching operates on substrings, so ordering doesn’t affect detection as long as all tokens are present. The main goal of this change was to ensure determinism rather than enforce semantic structure. That said, a priority-based ordering for tool payloads could improve readability and might be useful if future rules depend on ordering. Maybe that’s something we could consider separately or discuss for scan/policy stages? |
|
Yeah, makes sense to keep this PR focused. We can explore priority-based ordering separately if it starts affecting scan/policy behavior. |
|
Sounds good, thanks! Agreed, keeping this focused on determinism makes sense. We can revisit priority-based ordering separately if needed. |
Problem
In
normalise.go,payloadTextprocesses structured payloads (map[string]any) by iterating over the map and concatenating string values. Go map iteration order is non-deterministic, meaning the same input can produce differentCanonicalTextacross runs.Example —
{"name": "shell", "args": "rm -rf /"}may produce:or:
depending on iteration order. If a jailbreak pattern spans a field boundary, whether Aho-Corasick matches it depends on which order the fields are joined, leading to inconsistent scan results for identical input.
Fix
Collect map keys, sort them using
sort.Strings, then iterate in sorted order before joining:{"name": "shell", "args": "rm -rf /"}now always produces:(alphabetical key order:
args < name)Tests Added
TestNormalise_MapPayloadDeterministicCanonicalTextTestNormalise_MapPayloadSortedKeyOrderTest Results
All tests pass. No regressions observed.
closes #48