Symptom
dazzle qa trial's post-trial dedup pass (v0.57.83) repeatedly fails to collapse near-identical friction entries, producing reports that triple- or quintuple-count the same observation.
The cycle 120 ops_dashboard/oncall_engineer trial recorded 11 friction observations, but after a manual dedup pass these collapse to ~4 distinct findings:
| Category |
Description (first 80 chars) |
seen |
severity |
| missing/other |
"The alerts page shows 'No alerts. All systems operational'" |
5 |
high/medium |
| missing |
"The systems page shows 'No systems registered'" |
1 |
high |
| bug |
"Clicking the 'Systems' navigation link repeatedly fails with timeout" |
1 |
high |
| aesthetic |
"Multiple 'Remove card' buttons visible" |
1 |
medium |
| confusion |
"Can't tell if system is healthy or unconfigured" |
1 |
medium |
| praise |
"Clear focused layout / filtering options" |
2 |
low |
The 5 "No alerts" entries are near-identical: same URL (/app/alert), same DOM excerpt ("No items yet" / "No alerts. All systems operational."), same root meaning ("can't evaluate alert workflow because the page is empty"). Differences are only in surface wording. The dedup post-processor saw all 5 and let all 5 through.
Cross-cycle reinforcement
This is the 3rd cycle observing the dedup-miss pattern:
| Cycle |
Backlog id |
What it missed |
| 3 |
TR-7 |
10+ "Systems page shows No items found" variants (ops_dashboard) |
| 112 |
TR-18 |
4 near-identical "status filter works for Recalled devices" praise entries (fieldtest_hub) |
| 120 |
(this) |
5 "No alerts. All systems operational" entries (ops_dashboard) |
Per the /improve trials-lane policy: "seen ≥ 3 across cycles → file issue — cross-cycle evidence is strongest triage signal." This is that threshold.
Likely root cause
The dedup heuristic v0.57.83 likely keys on exact-match of the description field or a Jaccard-style word overlap. It misses when:
- The same URL + same evidence DOM excerpt appears
- But descriptions vary in framing ("can't evaluate alert workflow" vs "page is empty" vs "no sample data to test acknowledgment")
A more robust heuristic would key on (url, evidence_excerpt[:N]) or (category, normalised_dom_pattern) rather than description text.
Suggested fix
- Audit
dazzle.cli.qa.trial's dedup pass (path candidate: src/dazzle/cli/qa.py or wherever the trial report writes — search for "dedup" or "v0.57.83").
- Add a stronger key:
(category, url, evidence_excerpt[:80]). The description varies; the evidence shouldn't.
- Add a regression test: feed the cycle 120 raw friction list (11 entries) and assert post-dedup count is ≤ 5.
Repro
cd examples/ops_dashboard
dazzle qa trial --scenario oncall_engineer --fresh-db
# Trial report writes to dev_docs/qa-trial-oncall_engineer-<ts>.md
# Count "No alerts" friction entries — there are 5 with near-identical evidence
Trial markdown from cycle 120: examples/ops_dashboard/dev_docs/qa-trial-oncall_engineer-20260514-135756.md (gitignored — copy attached to this issue if needed).
Discovered by
/improve cycle 120 trials lane, ops_dashboard/oncall_engineer rotation. Cross-references TR-7 (cycle 3, ops_dashboard, LIKELY_RESOLVED) and TR-18 (cycle 112, fieldtest_hub, OPEN_FRAMEWORK). Three observations of the same defect class across three different example apps × scenarios over the last 100+ cycles.
Symptom
dazzle qa trial's post-trial dedup pass (v0.57.83) repeatedly fails to collapse near-identical friction entries, producing reports that triple- or quintuple-count the same observation.The cycle 120
ops_dashboard/oncall_engineertrial recorded 11 friction observations, but after a manual dedup pass these collapse to ~4 distinct findings:The 5 "No alerts" entries are near-identical: same URL (
/app/alert), same DOM excerpt ("No items yet" / "No alerts. All systems operational."), same root meaning ("can't evaluate alert workflow because the page is empty"). Differences are only in surface wording. The dedup post-processor saw all 5 and let all 5 through.Cross-cycle reinforcement
This is the 3rd cycle observing the dedup-miss pattern:
Per the
/improvetrials-lane policy: "seen ≥ 3 across cycles → file issue — cross-cycle evidence is strongest triage signal." This is that threshold.Likely root cause
The dedup heuristic v0.57.83 likely keys on exact-match of the
descriptionfield or a Jaccard-style word overlap. It misses when:A more robust heuristic would key on
(url, evidence_excerpt[:N])or(category, normalised_dom_pattern)rather than description text.Suggested fix
dazzle.cli.qa.trial's dedup pass (path candidate:src/dazzle/cli/qa.pyor wherever the trial report writes — search for "dedup" or "v0.57.83").(category, url, evidence_excerpt[:80]). The description varies; the evidence shouldn't.Repro
Trial markdown from cycle 120:
examples/ops_dashboard/dev_docs/qa-trial-oncall_engineer-20260514-135756.md(gitignored — copy attached to this issue if needed).Discovered by
/improvecycle 120 trials lane, ops_dashboard/oncall_engineer rotation. Cross-references TR-7 (cycle 3, ops_dashboard, LIKELY_RESOLVED) and TR-18 (cycle 112, fieldtest_hub, OPEN_FRAMEWORK). Three observations of the same defect class across three different example apps × scenarios over the last 100+ cycles.