Stabilize severity ratings for recurring findings across re-review runs

## What happened

On PR [#531](https://github.com/konflux-ci/build-service/pull/531), the review agent flagged k8s.io version skew in `e2e-tests/go.mod` approximately 10 times across different review runs. The finding was substantively identical each time (same modules, same version mismatch), but the assigned severity varied from `low` to `critical` across runs. For example, the same k8s.io staging module skew pattern was rated `low` (api-contract) in one run, `medium` in another, and `critical` in a third, despite the diff being unchanged or near-identical after rebases.

## What could go better

Severity inconsistency erodes trust in the review agent's judgment. If a human sees the same issue rated `critical` one day and `low` the next on the same code, they learn to discount all severity labels. Issue #2746 addresses non-deterministic *verdicts* (approve vs request changes) on unchanged commits, but does not specifically address severity-level instability for individual findings. The root cause is likely that severity is determined by the LLM without anchoring to prior assessments of the same pattern. Confidence: medium — the severity variation could partly reflect genuine differences in the surrounding diff context after rebases, but the core finding (k8s.io version skew) was identical.

## Proposed change

In the review agent's finding-generation prompt or post-processing logic, add severity anchoring for recurring findings. When the agent detects a finding that matches a prior finding on the same PR (same file, same or overlapping diff region, same category tag), it should either (a) reuse the prior severity unless it can articulate a specific reason for the change, or (b) surface the prior severity as context in its prompt so the LLM can make a deliberate decision rather than an independent re-assessment. This could be implemented in the review agent definition or in a review post-processing skill. The simplest approach: when generating the review, include a summary of prior findings from the sticky comment as context, with their severities, and instruct the agent to maintain severity consistency unless the code has materially changed.

## Validation criteria

On the next PR where the review agent produces the same finding across 3+ review iterations, the severity rating should be consistent (same level) or, if changed, accompanied by an explicit justification referencing what changed. Verify by auditing severity labels on 5 PRs with multiple review iterations and confirming <10% unjustified severity drift for recurring findings.

---
_Generated by retro agent from https://github.com/konflux-ci/build-service/pull/531_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stabilize severity ratings for recurring findings across re-review runs #2993

What happened

What could go better

Proposed change

Validation criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Stabilize severity ratings for recurring findings across re-review runs #2993

Description

What happened

What could go better

Proposed change

Validation criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions