Skip to content

qa trial: dedup post-processor (v0.57.83) misses near-identical entries — 5 'No alerts' duplicates passed through on ops_dashboard run #1073

@manwithacat

Description

@manwithacat

Symptom

dazzle qa trial's post-trial dedup pass (v0.57.83) repeatedly fails to collapse near-identical friction entries, producing reports that triple- or quintuple-count the same observation.

The cycle 120 ops_dashboard/oncall_engineer trial recorded 11 friction observations, but after a manual dedup pass these collapse to ~4 distinct findings:

Category Description (first 80 chars) seen severity
missing/other "The alerts page shows 'No alerts. All systems operational'" 5 high/medium
missing "The systems page shows 'No systems registered'" 1 high
bug "Clicking the 'Systems' navigation link repeatedly fails with timeout" 1 high
aesthetic "Multiple 'Remove card' buttons visible" 1 medium
confusion "Can't tell if system is healthy or unconfigured" 1 medium
praise "Clear focused layout / filtering options" 2 low

The 5 "No alerts" entries are near-identical: same URL (/app/alert), same DOM excerpt ("No items yet" / "No alerts. All systems operational."), same root meaning ("can't evaluate alert workflow because the page is empty"). Differences are only in surface wording. The dedup post-processor saw all 5 and let all 5 through.

Cross-cycle reinforcement

This is the 3rd cycle observing the dedup-miss pattern:

Cycle Backlog id What it missed
3 TR-7 10+ "Systems page shows No items found" variants (ops_dashboard)
112 TR-18 4 near-identical "status filter works for Recalled devices" praise entries (fieldtest_hub)
120 (this) 5 "No alerts. All systems operational" entries (ops_dashboard)

Per the /improve trials-lane policy: "seen ≥ 3 across cycles → file issue — cross-cycle evidence is strongest triage signal." This is that threshold.

Likely root cause

The dedup heuristic v0.57.83 likely keys on exact-match of the description field or a Jaccard-style word overlap. It misses when:

  • The same URL + same evidence DOM excerpt appears
  • But descriptions vary in framing ("can't evaluate alert workflow" vs "page is empty" vs "no sample data to test acknowledgment")

A more robust heuristic would key on (url, evidence_excerpt[:N]) or (category, normalised_dom_pattern) rather than description text.

Suggested fix

  1. Audit dazzle.cli.qa.trial's dedup pass (path candidate: src/dazzle/cli/qa.py or wherever the trial report writes — search for "dedup" or "v0.57.83").
  2. Add a stronger key: (category, url, evidence_excerpt[:80]). The description varies; the evidence shouldn't.
  3. Add a regression test: feed the cycle 120 raw friction list (11 entries) and assert post-dedup count is ≤ 5.

Repro

cd examples/ops_dashboard
dazzle qa trial --scenario oncall_engineer --fresh-db
# Trial report writes to dev_docs/qa-trial-oncall_engineer-<ts>.md
# Count "No alerts" friction entries — there are 5 with near-identical evidence

Trial markdown from cycle 120: examples/ops_dashboard/dev_docs/qa-trial-oncall_engineer-20260514-135756.md (gitignored — copy attached to this issue if needed).

Discovered by

/improve cycle 120 trials lane, ops_dashboard/oncall_engineer rotation. Cross-references TR-7 (cycle 3, ops_dashboard, LIKELY_RESOLVED) and TR-18 (cycle 112, fieldtest_hub, OPEN_FRAMEWORK). Three observations of the same defect class across three different example apps × scenarios over the last 100+ cycles.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions