Skip to content

ci(codex): post review as inline diff comments#78

Merged
leehanchung merged 2 commits into
mainfrom
ci/codex-inline-comments
May 6, 2026
Merged

ci(codex): post review as inline diff comments#78
leehanchung merged 2 commits into
mainfrom
ci/codex-inline-comments

Conversation

@leehanchung
Copy link
Copy Markdown
Owner

Summary

Phase 2 of the codex-reviewer parity work. Stacked on #76.

Until now Codex was posting a single review with all findings concatenated into the body. Claude (via the subagent architecture) anchors each finding to the specific diff line in the Files Changed tab. This change brings Codex to format parity:

  • Prompt now asks Codex for a single JSON object with event, body, and an array of comments (path / line / side / body per finding). Schema is documented inline so the model has no ambiguity.
  • Submit step parses the JSON with jq, builds the Reviews API payload in jq (so review text never round-trips through shell quoting), and posts via gh api --input <file>. We deliberately do NOT use -f event=... flags because gh api --input routes those into the query string, not the body — codex itself flagged that exact mistake in PR ci: add Codex reviewer and migrate Claude review to subagent architecture #61's review of the slash-command migration.
  • Defensive cleanup: strip leading/trailing ```json fences before parsing (codex sometimes wraps JSON in a fence despite the explicit instruction); validate the event value and fail loudly if it's outside APPROVE / COMMENT / REQUEST_CHANGES.

Once this lands, Codex reviews show up inline in the Files Changed tab next to the relevant lines, exactly like Claude's.

Test plan

  • CI on this PR posts a review with at least one inline comment anchored to a specific line in .github/workflows/codex-code-review.yml (or summary-only if Codex thinks the change is clean)
  • No 422 from the Reviews API (would surface as a workflow failure)

Stacking note

Base branch is ci/codex-app-token (PR #76). Once #76 merges, this PR's base auto-rebases to main and CI re-runs.

🤖 Generated with Claude Code

Copy link
Copy Markdown

@codex-code-reviewer codex-code-reviewer Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline review flow is generally well structured, but the submit step still trusts model-produced comment coordinates without validating them before calling the Reviews API, so a single bad line/path can fail the workflow with a 422 despite the prompt warning the model to be precise.

# text never round-trips through shell quoting. `--input`
# below sends the file as the request body; we do NOT pass
# `-f event=...` flags because `gh api --input` puts those
# in the query string instead of the body (codex itself
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This forwards .comments directly into the Reviews API without validating that each entry has a diff path and a valid LEFT/RIGHT line. Since model output is not deterministic, one hallucinated coordinate will still produce the 422 this change is trying to avoid; consider validating comment shape/coordinates or dropping invalid comments before posting.

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review summary

Minor observation: one new code path logs raw model output to the runner log on JSON-parse failure, which creates an exfiltration surface for sensitive on-runner files if Codex is prompt-injected; everything else looks correct.

Deployment

No changes to apps/delulu_discord or apps/delulu_sandbox_modal — no rebuild or redeploy needed.


if ! echo "$CLEAN" | jq empty 2>/dev/null; then
echo "::error::Codex output is not valid JSON. Raw output:"
printf '%s\n' "$RAW"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security: printf '%s\n' "$RAW" logs the full raw Codex output to the runner log on any JSON-parse failure. This is a new code path with no equivalent in the old code. The runner has $CODEX_HOME/auth.json and ~/.ssh/codex_droplet on disk during the Codex step; a crafted PR body could instruct Codex to embed the contents of those files in its response and then append non-JSON text to force this error path, publishing the secrets to the runner log (readable by any repo contributor). Consider replacing this with a byte-count message pointing readers at the run_codex step output directly:

echo "::error::Codex output is not valid JSON ($(wc -c < "$CODEX_OUTPUT_FILE") bytes); inspect the run_codex step logs."

Base automatically changed from ci/codex-app-token to main May 6, 2026 05:11
Two-line conceptual change to the codex review pipeline:

1. Codex prompt now emits a structured JSON object
   (`event` + `body` + `comments[]`) instead of free-form prose
   ending in a `VERDICT:` line. Schema spelled out in the prompt
   so the model can self-check before emitting.

2. Submit step parses that JSON and posts via the GitHub Reviews
   API with `--input <file>` so the body and comment text never
   round-trip through shell quoting. Each `comments[]` entry
   becomes an inline diff comment anchored to a specific
   path:line, matching how anthropics/claude-code-action
   surfaces. Empty `comments` array still posts a clean top-level
   review.

Defensive cleanup: codex sometimes wraps the JSON in a ```json
fence despite the prompt saying not to. Strip leading/trailing
fences before `jq empty` validation so a known model quirk
doesn't fail the workflow.

Event-value sanity check before posting (`APPROVE` /
`REQUEST_CHANGES` / `COMMENT`) so a freelanced value surfaces as
a clear error instead of an opaque GitHub 422.

This was previously PR #78 with conflicts against #80's
auto-resolve work; the rebase combines both — the new submit
step keeps `id: submit` so the Resolve step's gate
(`steps.submit.conclusion == 'success'`) still fires after
inline-comment reviews.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@leehanchung leehanchung force-pushed the ci/codex-inline-comments branch from 4b870e3 to 00acb18 Compare May 6, 2026 05:34
Copy link
Copy Markdown

@codex-code-reviewer codex-code-reviewer Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline review flow is close, but the submit step still has two reliability/security edges: it trusts model-supplied inline comment coordinates all the way into the Reviews API, and it logs raw model output on parse failure.

if ! echo "$CLEAN" | jq empty 2>/dev/null; then
echo "::error::Codex output is not valid JSON. Raw output:"
printf '%s\n' "$RAW"
exit 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging the full raw Codex output on parse failure can expose anything the model was induced to read or include in its response. Prefer logging a size/error summary and avoid echoing model-controlled content into workflow logs.

echo "$CLEAN" \
| jq --arg sha "${{ github.event.pull_request.head.sha }}" \
'{commit_id: $sha, event: .event, body: .body, comments: (.comments // [])}' \
> "$PAYLOAD"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This forwards .comments without validating path, side, line, or body shape. A single hallucinated coordinate will still make the Reviews API return 422, so this needs schema validation or invalid-comment filtering before posting.

@leehanchung leehanchung merged commit 7749712 into main May 6, 2026
2 of 3 checks passed
@leehanchung leehanchung deleted the ci/codex-inline-comments branch May 6, 2026 05:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant