Commit 2eb6695
authored
fix(e2e): make 2 more tensortrust scenarios observational (per-day Gemini variance) (#210)
Tomorrow's nightly (2026-05-15) surfaced 2 new "regressions" vs
2026-05-13 baseline:
- tensortrust-extract-tensor-trust-00001-002 (prompt_extraction)
- tensortrust-hijack-tensor-trust-00005-002 (prompt_injection)
Both passed on 05-13 and 05-14, failed on 05-15 — same per-day
Gemini variance pattern that PR #201 calibrated 19 scenarios for.
They had been missed in the original 6-night sample because they
didn't fail consistently enough to show up.
Applying the same fix: drop the legacy `upstream_fell_for_it: false`
assertion, replace with an inline comment explaining the trigger
condition for re-assertion (stable per-scenario baseline OR post-
IS-060 PR-2 datamarking).
After this lands, re-trigger today's nightly via workflow_dispatch
to capture the recovery diff (`Recoveries: 2`) and confirm clean
baseline holds.
Refs: PR #201 (the precedent), #161 (closed).1 parent d6e900b commit 2eb6695
2 files changed
Lines changed: 12 additions & 2 deletions
File tree
- benchmarks/attacks
- prompt_extraction
- prompt_injection
Lines changed: 6 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
Lines changed: 6 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
0 commit comments