Review agent should consider multi-replica deployment behavior

## What happened

On [PR #6962](https://github.com/konflux-ci/konflux-ci/pull/6962), the fullsend review agent reviewed a shell script (`run-dex.sh`) that wraps Dex with token rotation logic. The script polls for token changes on a fixed interval and restarts Dex when rotation is detected. The review agent ran 4 times ([run 1](https://github.com/konflux-ci/.fullsend/actions/runs/26223178990), [run 2](https://github.com/konflux-ci/.fullsend/actions/runs/26441889125), [run 3](https://github.com/konflux-ci/.fullsend/actions/runs/27629874612), [run 4](https://github.com/konflux-ci/.fullsend/actions/runs/27692859310)) and tracked findings across 10 commit iterations. It found several valid issues: zombie process reaping, unquoted variables, unnecessary restarts on non-OpenShift, and crash detection gaps.

However, the human reviewer ([gbenhaim's comment on May 28](https://github.com/konflux-ci/konflux-ci/pull/6962#discussion_r2179488832)) identified a critical operational concern the bot never raised: when multiple replicas (3 pods) all use the same fixed polling interval, they will detect token rotation and restart Dex simultaneously, causing a brief total outage. The reviewer suggested adding random jitter to stagger restarts. This is a well-known distributed systems pattern (thundering herd / synchronized polling) that the review agent missed across all 10 iterations.

## What could go better

The review agent focuses on single-process correctness (error handling, quoting, logic flow) but does not consider how code behaves when running as multiple replicas in a Kubernetes deployment. This is a meaningful gap for infrastructure-focused repos like konflux-ci where most code runs in pods with replica counts > 1.

Confidence: **Medium-high.** The evidence is clear — the bot reviewed the same script 10 times without flagging the synchronized restart issue, while the human caught it on first review. The gap is specific and reproducible: any code with fixed-interval polling or time-based behavior in a multi-replica deployment context.

Uncertainty: It's unclear how often this class of issue arises. It may be infrequent enough that adding it to the review prompt yields low signal-to-noise. However, when it does arise, it's high-impact (brief service outages).

## Proposed change

Add a review consideration to the review agent's prompt or skill definition that instructs it to evaluate multi-replica deployment behavior when reviewing code that:
- Runs in containers/pods (Dockerfiles, entrypoint scripts, shell wrappers)
- Contains polling loops, sleep intervals, or time-based triggers
- Manages process lifecycle (start/stop/restart patterns)

The prompt addition should instruct the agent to ask: "If N replicas run this code simultaneously, will synchronized behavior cause availability issues? Should jitter, leader election, or staggered scheduling be considered?"

This belongs in the review agent's skill or agent definition in `fullsend-ai/fullsend`, likely in the review analysis checklist or the section that guides what classes of issues to look for.

## Validation criteria

On the next 5 PRs to konflux-ci/konflux-ci (or similar infrastructure repos) that modify shell scripts or entrypoint wrappers with polling/restart logic, the review agent should evaluate multi-replica behavior. Success means: (1) the agent mentions replica/concurrency considerations when relevant, and (2) does not raise false positives on code that is inherently single-instance. Measure over a 60-day window after implementation.

---
_Generated by retro agent from https://github.com/konflux-ci/konflux-ci/pull/6962_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Review agent should consider multi-replica deployment behavior #2408

What happened

What could go better

Proposed change

Validation criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Review agent should consider multi-replica deployment behavior #2408

Description

What happened

What could go better

Proposed change

Validation criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions