qa trial: dedup post-processor (v0.57.83) misses near-identical entries — 5 'No alerts' duplicates passed through on ops_dashboard run

## Symptom

`dazzle qa trial`'s post-trial dedup pass (v0.57.83) repeatedly fails to collapse near-identical friction entries, producing reports that triple- or quintuple-count the same observation.

The cycle 120 `ops_dashboard/oncall_engineer` trial recorded **11 friction observations**, but after a manual dedup pass these collapse to **~4 distinct findings**:

| Category | Description (first 80 chars) | seen | severity |
|----------|------------------------------|------|----------|
| missing/other | "The alerts page shows 'No alerts. All systems operational'" | **5** | high/medium |
| missing | "The systems page shows 'No systems registered'" | 1 | high |
| bug | "Clicking the 'Systems' navigation link repeatedly fails with timeout" | 1 | high |
| aesthetic | "Multiple 'Remove card' buttons visible" | 1 | medium |
| confusion | "Can't tell if system is healthy or unconfigured" | 1 | medium |
| praise | "Clear focused layout / filtering options" | 2 | low |

The 5 "No alerts" entries are near-identical: same URL (`/app/alert`), same DOM excerpt ("No items yet" / "No alerts. All systems operational."), same root meaning ("can't evaluate alert workflow because the page is empty"). Differences are only in surface wording. The dedup post-processor saw all 5 and let all 5 through.

## Cross-cycle reinforcement

This is the **3rd cycle observing the dedup-miss pattern**:

| Cycle | Backlog id | What it missed |
|-------|-----------|----------------|
| 3 | TR-7 | 10+ "Systems page shows No items found" variants (ops_dashboard) |
| 112 | TR-18 | 4 near-identical "status filter works for Recalled devices" praise entries (fieldtest_hub) |
| **120** | **(this)** | 5 "No alerts. All systems operational" entries (ops_dashboard) |

Per the `/improve` trials-lane policy: "seen ≥ 3 across cycles → file issue — cross-cycle evidence is strongest triage signal." This is that threshold.

## Likely root cause

The dedup heuristic v0.57.83 likely keys on exact-match of the `description` field or a Jaccard-style word overlap. It misses when:

- The same URL + same evidence DOM excerpt appears
- But descriptions vary in framing ("can't evaluate alert workflow" vs "page is empty" vs "no sample data to test acknowledgment")

A more robust heuristic would key on `(url, evidence_excerpt[:N])` or `(category, normalised_dom_pattern)` rather than description text.

## Suggested fix

1. Audit `dazzle.cli.qa.trial`'s dedup pass (path candidate: `src/dazzle/cli/qa.py` or wherever the trial report writes — search for "dedup" or "v0.57.83").
2. Add a stronger key: `(category, url, evidence_excerpt[:80])`. The description varies; the evidence shouldn't.
3. Add a regression test: feed the cycle 120 raw friction list (11 entries) and assert post-dedup count is ≤ 5.

## Repro

```bash
cd examples/ops_dashboard
dazzle qa trial --scenario oncall_engineer --fresh-db
# Trial report writes to dev_docs/qa-trial-oncall_engineer-<ts>.md
# Count "No alerts" friction entries — there are 5 with near-identical evidence
```

Trial markdown from cycle 120: `examples/ops_dashboard/dev_docs/qa-trial-oncall_engineer-20260514-135756.md` (gitignored — copy attached to this issue if needed).

## Discovered by

`/improve` cycle 120 trials lane, ops_dashboard/oncall_engineer rotation. Cross-references TR-7 (cycle 3, ops_dashboard, LIKELY_RESOLVED) and TR-18 (cycle 112, fieldtest_hub, OPEN_FRAMEWORK). Three observations of the same defect class across three different example apps × scenarios over the last 100+ cycles.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qa trial: dedup post-processor (v0.57.83) misses near-identical entries — 5 'No alerts' duplicates passed through on ops_dashboard run #1073

Symptom

Cross-cycle reinforcement

Likely root cause

Suggested fix

Repro

Discovered by

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Category	Description (first 80 chars)	seen	severity
missing/other	"The alerts page shows 'No alerts. All systems operational'"	5	high/medium
missing	"The systems page shows 'No systems registered'"	1	high
bug	"Clicking the 'Systems' navigation link repeatedly fails with timeout"	1	high
aesthetic	"Multiple 'Remove card' buttons visible"	1	medium
confusion	"Can't tell if system is healthy or unconfigured"	1	medium
praise	"Clear focused layout / filtering options"	2	low

Cycle	Backlog id	What it missed
3	TR-7	10+ "Systems page shows No items found" variants (ops_dashboard)
112	TR-18	4 near-identical "status filter works for Recalled devices" praise entries (fieldtest_hub)
120	(this)	5 "No alerts. All systems operational" entries (ops_dashboard)

qa trial: dedup post-processor (v0.57.83) misses near-identical entries — 5 'No alerts' duplicates passed through on ops_dashboard run #1073

Description

Symptom

Cross-cycle reinforcement

Likely root cause

Suggested fix

Repro

Discovered by

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions