Skip to content

Review agent should reason about CI infrastructure usage patterns when reviewing optimization PRs #2880

Description

@fullsend-ai-retro

What happened

On PR #2796, the review agent approved a change that removed sandbox image caching from action.yml (+1/−62 lines). The agent's verdict was a clean approval: "Looks good to me." However, human reviewer rh-hemartin requested changes on Jul 1, noting: "Code (and fix) are the least run agents, and the other do not use the code image, but the base one. Drop the pull entirely and let the sandbox do its work." This was a substantive architectural insight — the remaining podman pull step was unnecessary because only rarely-run agents use the code image. The author updated the PR, both the agent and human re-approved, and it merged Jul 2.

What could go better

The review agent correctly verified that the diff was internally consistent (removing cache logic, dropping skopeo dependency), but it could not assess whether the optimization was sufficient. The human reviewer's insight required knowing: (1) which container images are used by which agent types (code/fix use the code image; triage/review/retro use the base image), and (2) that code/fix agents are the least frequently run. This is operational knowledge about the CI infrastructure that isn't documented in action.yml or easily inferred from the diff alone.

Confidence: Medium. The review agent's approval wasn't wrong per se — the change was correct and beneficial. The gap is that it missed an opportunity to suggest going further. This is a softer failure mode (false negative on an optimization suggestion) than missing a bug. However, it did result in a rework cycle that could have been avoided.

Existing issue check: #2235 is thematically adjacent (deployment-environment-aware review) but addresses tool availability, not runtime usage patterns. #1275 covers workflow_call chain tracing but not image/agent topology. Neither covers the specific gap of understanding which agents use which images and their relative frequency.

Proposed change

Add a section to the repo's AGENTS.md (or a dedicated docs/guides/dev/ci-infrastructure.md) documenting the sandbox image topology: which container images exist (base vs code), which agent types use each, and their relative run frequencies. This gives the review agent discoverable context when reviewing CI infrastructure changes.

Specifically, add content like:

## Sandbox image topology
- **Base image:** Used by triage, review, retro, and prioritize agents (most frequent)
- **Code image:** Used by code and fix agents only (least frequent)
- The composite action (`action.yml`) pre-pulls images before agent execution

This is a repo-specific documentation change — the image topology is specific to fullsend-ai/fullsend's CI setup. The review agent already reads AGENTS.md as part of its context gathering, so placing the information there ensures it's discoverable without requiring agent definition changes.

Validation criteria

On the next 3 PRs that modify action.yml or sandbox image configuration in this repo, the review agent should reference the documented image topology in its review reasoning. If a PR touches image pulling or caching logic, the agent should be able to identify which agent types are affected and comment on whether the change scope is appropriate. This can be verified by checking the review agent's trace/reasoning for references to the documented topology.


Generated by retro agent from #2796

Metadata

Metadata

Assignees

No one assigned

    Labels

    agent/reviewReview agentcomponent/sandboxOpenShell sandbox environmentpriority/lowNice to have, address when convenientready-for-triageRetro-filed issue awaiting triage agentready-to-codeTriaged and ready for the code agent

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    Status
    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions