Skip to content

fix(normalise): sort map keys before joining for deterministic CanonicalText #49

Open
RachanaB5 wants to merge 1 commit intoc2siorg:mainfrom
RachanaB5:fix/normalise-map-nondeterministic-order
Open

fix(normalise): sort map keys before joining for deterministic CanonicalText #49
RachanaB5 wants to merge 1 commit intoc2siorg:mainfrom
RachanaB5:fix/normalise-map-nondeterministic-order

Conversation

@RachanaB5
Copy link
Copy Markdown
Contributor

Problem

In normalise.go, payloadText processes structured payloads (map[string]any) by iterating over the map and concatenating string values. Go map iteration order is non-deterministic, meaning the same input can produce different CanonicalText across runs.

Example{"name": "shell", "args": "rm -rf /"} may produce:

"shell rm -rf /"

or:

"rm -rf / shell"

depending on iteration order. If a jailbreak pattern spans a field boundary, whether Aho-Corasick matches it depends on which order the fields are joined, leading to inconsistent scan results for identical input.


Fix

Collect map keys, sort them using sort.Strings, then iterate in sorted order before joining:

keys := make([]string, 0, len(v))
for k := range v {
    keys = append(keys, k)
}
sort.Strings(keys)
 
for _, k := range keys {
    if s, ok := v[k].(string); ok {
        parts = append(parts, s)
    }
}

{"name": "shell", "args": "rm -rf /"} now always produces:

"rm -rf / shell"

(alphabetical key order: args < name)


Tests Added

Test Description
TestNormalise_MapPayloadDeterministic Verifies that the same payload run multiple times produces identical CanonicalText
TestNormalise_MapPayloadSortedKeyOrder Asserts output follows sorted key order

Test Results

=== RUN   TestNormalise_MapPayloadDeterministic
--- PASS: TestNormalise_MapPayloadDeterministic (0.00s)
=== RUN   TestNormalise_MapPayloadSortedKeyOrder
--- PASS: TestNormalise_MapPayloadSortedKeyOrder (0.00s)
 
ok  github.com/acf-sdk/sidecar/internal/pipeline

All tests pass. No regressions observed.

closes #48

…calText

payloadText iterated map[string]any in random Go map order, producing
non-deterministic CanonicalText for on_tool_call and on_context payloads.
The same payload could produce different field orderings across runs,
causing flaky Aho-Corasick matches when a jailbreak pattern spans a
field boundary.

docs/pipeline.md shows a defined concatenation order:
  {"name": "shell", "args": "rm -rf /"} → "shell rm -rf /"

Fix: collect map keys, sort with sort.Strings, then iterate in sorted
order before joining values. CanonicalText is now stable and reproducible
for the same input on every run.

Adds:
- TestNormalise_MapPayloadDeterministic: 10 runs produce identical output
- TestNormalise_MapPayloadSortedKeyOrder: asserts exact sorted key order
@Pranjal0410
Copy link
Copy Markdown
Contributor

Nice catch on normalising the map ordering - makes sense for consistency.
One small thing I noticed was - with alphabetical sort, args comes before name, so {"name": "shell", "args": "rm -rf /"} becomes "rm -rf / shell". The command ends up before the tool name which feels a bit off semantically.

Could this affect how patterns match in the scan stage? For tool call payloads specifically, a fixed field priority might be more predictable - something like:

fieldOrder := []string{"name", "tool_name", "query", "args", "params"}

Then fall back to alphabetical for any keys not in the priority list. That way tool name always leads and the canonical text reads more naturally.

@RachanaB5
Copy link
Copy Markdown
Contributor Author

Good point on the semantic ordering, that’s a valid concern.

From the current pipeline design, normalise is intended to be a field-agnostic transform, and Aho-Corasick matching operates on substrings, so ordering doesn’t affect detection as long as all tokens are present.

The main goal of this change was to ensure determinism rather than enforce semantic structure.

That said, a priority-based ordering for tool payloads could improve readability and might be useful if future rules depend on ordering. Maybe that’s something we could consider separately or discuss for scan/policy stages?

@Pranjal0410
Copy link
Copy Markdown
Contributor

Yeah, makes sense to keep this PR focused. We can explore priority-based ordering separately if it starts affecting scan/policy behavior.

@RachanaB5
Copy link
Copy Markdown
Contributor Author

Sounds good, thanks! Agreed, keeping this focused on determinism makes sense. We can revisit priority-based ordering separately if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug] normalise: map iteration causes non-deterministic canonical text for structured payloads

2 participants