Skip to content

Review agent should consider multi-replica deployment behavior #2408

Description

@fullsend-ai-retro

What happened

On PR #6962, the fullsend review agent reviewed a shell script (run-dex.sh) that wraps Dex with token rotation logic. The script polls for token changes on a fixed interval and restarts Dex when rotation is detected. The review agent ran 4 times (run 1, run 2, run 3, run 4) and tracked findings across 10 commit iterations. It found several valid issues: zombie process reaping, unquoted variables, unnecessary restarts on non-OpenShift, and crash detection gaps.

However, the human reviewer (gbenhaim's comment on May 28) identified a critical operational concern the bot never raised: when multiple replicas (3 pods) all use the same fixed polling interval, they will detect token rotation and restart Dex simultaneously, causing a brief total outage. The reviewer suggested adding random jitter to stagger restarts. This is a well-known distributed systems pattern (thundering herd / synchronized polling) that the review agent missed across all 10 iterations.

What could go better

The review agent focuses on single-process correctness (error handling, quoting, logic flow) but does not consider how code behaves when running as multiple replicas in a Kubernetes deployment. This is a meaningful gap for infrastructure-focused repos like konflux-ci where most code runs in pods with replica counts > 1.

Confidence: Medium-high. The evidence is clear — the bot reviewed the same script 10 times without flagging the synchronized restart issue, while the human caught it on first review. The gap is specific and reproducible: any code with fixed-interval polling or time-based behavior in a multi-replica deployment context.

Uncertainty: It's unclear how often this class of issue arises. It may be infrequent enough that adding it to the review prompt yields low signal-to-noise. However, when it does arise, it's high-impact (brief service outages).

Proposed change

Add a review consideration to the review agent's prompt or skill definition that instructs it to evaluate multi-replica deployment behavior when reviewing code that:

  • Runs in containers/pods (Dockerfiles, entrypoint scripts, shell wrappers)
  • Contains polling loops, sleep intervals, or time-based triggers
  • Manages process lifecycle (start/stop/restart patterns)

The prompt addition should instruct the agent to ask: "If N replicas run this code simultaneously, will synchronized behavior cause availability issues? Should jitter, leader election, or staggered scheduling be considered?"

This belongs in the review agent's skill or agent definition in fullsend-ai/fullsend, likely in the review analysis checklist or the section that guides what classes of issues to look for.

Validation criteria

On the next 5 PRs to konflux-ci/konflux-ci (or similar infrastructure repos) that modify shell scripts or entrypoint wrappers with polling/restart logic, the review agent should evaluate multi-replica behavior. Success means: (1) the agent mentions replica/concurrency considerations when relevant, and (2) does not raise false positives on code that is inherently single-instance. Measure over a 60-day window after implementation.


Generated by retro agent from konflux-ci/konflux-ci#6962

Metadata

Metadata

Assignees

No one assigned

    Labels

    agent/reviewReview agentcomponent/harnessAgent harness, config, and skills loadingfeatureFeature-category issue awaiting human prioritizationpriority/mediumNormal priority, plan for next cycletriagedTriaged but awaiting human prioritization

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    Status
    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions