Skip to content

Commit d544cb0

Browse files
authored
ops(e2e): flip IS-060 datamarking to ACTIVE mode for ASR-delta collection (#220)
Stage 2 of the IS-060 PR-2 validation chain. Shadow mode validated 2026-05-15 via workflow_dispatch run 25937040210: 85/91 scenarios passed, zero errors, zero 4xx-rate shift vs prior nightly, transform ran cleanly through the full e2e harness with zone_detection + boundary_defense + datamarking shadow active. This flip enables marker substitution against the real upstream (Gemini 2.0 Flash via OpenRouter): request → zone-detect → boundary-wrap → DATAMARK (active) → forward upstream WITH marker-substituted Data zones The system-reminder addendum tells the upstream model the marked text is data, not instructions. Acceptance criteria for the 3-5 nightly cycles that follow this merge: - Pass rate stays in the 84-88 / 91 range (matches the recent per-day Gemini variance band; not a regression metric on its own) - llmtrace_spotlighting_byte_delta_total > 0 (transform fires) - llmtrace_spotlighting_failures_total == 0 - llmtrace_spotlighting_marker_collision_total stays bounded - upstream_fell_for_it rate on indirect_injection family scenarios shifts DOWNWARD relative to the pre-datamarking baseline (the ASR delta — the headline metric) Evidence will be committed to docs/research/results/upstream_judge_datamarking_evidence_<date>.md after 3-5 cycles of post-active-mode data. Refs: #214 (PR-2 implementation), #213 (PR-3 corpus), #216 (shadow-mode enable), #219 (zone_detection enable), workflow_dispatch run 25937040210 (shadow-mode validation evidence).
1 parent 685e10b commit d544cb0

1 file changed

Lines changed: 10 additions & 8 deletions

File tree

tests/e2e/fixtures/config-e2e-judge.yaml

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -66,20 +66,22 @@ security_analysis:
6666

6767

6868
# IS-060 — Boundary defense + datamarking transform.
69-
# Enabled in shadow mode for the first validation cycle: data zones are
70-
# wrapped with <llmtrace-boundary>...</llmtrace-boundary> tags AND the
71-
# datamarking pipeline computes the U+E000 substitution + emits metrics,
72-
# but shadow_mode=true means the original (unmarked) bytes still go
73-
# upstream. Validate one nightly cycle for zero 4xx delta and non-zero
74-
# byte_delta_total before flipping shadow_mode to false in a follow-up
75-
# ops PR. See PR #214 + docs/architecture/SPOTLIGHTING_INDIRECT_INJECTION.md.
69+
# Active mode: data zones are wrapped with <llmtrace-boundary>...
70+
# </llmtrace-boundary> tags AND the datamarking pipeline substitutes a
71+
# Unicode Private Use Area marker for whitespace in detected Data zones
72+
# BEFORE forwarding upstream. The system-reminder addendum tells the
73+
# upstream model that marked text is data, not instructions. Shadow-mode validated 2026-05-15 via workflow_dispatch run 25937040210:
74+
# 85/91 passed with zero errors and zero 4xx-rate shift vs prior nightly.
75+
# This flip activates marker substitution against the production
76+
# upstream (Gemini 2.0 Flash via OpenRouter); the next 3-5 scheduled
77+
# nightlies measure ASR delta on BIPIA scenarios vs the shadow baseline. See PR #214 + docs/architecture/SPOTLIGHTING_INDIRECT_INJECTION.md.
7678
boundary_defense:
7779
enabled: true
7880
randomize_nonce: true
7981
inject_system_reminder: true
8082
datamarking:
8183
enabled: true
82-
shadow_mode: true
84+
shadow_mode: false
8385
marker_strategy:
8486
kind: randomized
8587

0 commit comments

Comments
 (0)