ci(codex): post review as inline diff comments#78
Conversation
There was a problem hiding this comment.
The inline review flow is generally well structured, but the submit step still trusts model-produced comment coordinates without validating them before calling the Reviews API, so a single bad line/path can fail the workflow with a 422 despite the prompt warning the model to be precise.
| # text never round-trips through shell quoting. `--input` | ||
| # below sends the file as the request body; we do NOT pass | ||
| # `-f event=...` flags because `gh api --input` puts those | ||
| # in the query string instead of the body (codex itself |
There was a problem hiding this comment.
This forwards .comments directly into the Reviews API without validating that each entry has a diff path and a valid LEFT/RIGHT line. Since model output is not deterministic, one hallucinated coordinate will still produce the 422 this change is trying to avoid; consider validating comment shape/coordinates or dropping invalid comments before posting.
There was a problem hiding this comment.
Review summary
Minor observation: one new code path logs raw model output to the runner log on JSON-parse failure, which creates an exfiltration surface for sensitive on-runner files if Codex is prompt-injected; everything else looks correct.
Deployment
No changes to apps/delulu_discord or apps/delulu_sandbox_modal — no rebuild or redeploy needed.
|
|
||
| if ! echo "$CLEAN" | jq empty 2>/dev/null; then | ||
| echo "::error::Codex output is not valid JSON. Raw output:" | ||
| printf '%s\n' "$RAW" |
There was a problem hiding this comment.
Security: printf '%s\n' "$RAW" logs the full raw Codex output to the runner log on any JSON-parse failure. This is a new code path with no equivalent in the old code. The runner has $CODEX_HOME/auth.json and ~/.ssh/codex_droplet on disk during the Codex step; a crafted PR body could instruct Codex to embed the contents of those files in its response and then append non-JSON text to force this error path, publishing the secrets to the runner log (readable by any repo contributor). Consider replacing this with a byte-count message pointing readers at the run_codex step output directly:
echo "::error::Codex output is not valid JSON ($(wc -c < "$CODEX_OUTPUT_FILE") bytes); inspect the run_codex step logs."Two-line conceptual change to the codex review pipeline: 1. Codex prompt now emits a structured JSON object (`event` + `body` + `comments[]`) instead of free-form prose ending in a `VERDICT:` line. Schema spelled out in the prompt so the model can self-check before emitting. 2. Submit step parses that JSON and posts via the GitHub Reviews API with `--input <file>` so the body and comment text never round-trip through shell quoting. Each `comments[]` entry becomes an inline diff comment anchored to a specific path:line, matching how anthropics/claude-code-action surfaces. Empty `comments` array still posts a clean top-level review. Defensive cleanup: codex sometimes wraps the JSON in a ```json fence despite the prompt saying not to. Strip leading/trailing fences before `jq empty` validation so a known model quirk doesn't fail the workflow. Event-value sanity check before posting (`APPROVE` / `REQUEST_CHANGES` / `COMMENT`) so a freelanced value surfaces as a clear error instead of an opaque GitHub 422. This was previously PR #78 with conflicts against #80's auto-resolve work; the rebase combines both — the new submit step keeps `id: submit` so the Resolve step's gate (`steps.submit.conclusion == 'success'`) still fires after inline-comment reviews. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4b870e3 to
00acb18
Compare
| if ! echo "$CLEAN" | jq empty 2>/dev/null; then | ||
| echo "::error::Codex output is not valid JSON. Raw output:" | ||
| printf '%s\n' "$RAW" | ||
| exit 1 |
There was a problem hiding this comment.
Logging the full raw Codex output on parse failure can expose anything the model was induced to read or include in its response. Prefer logging a size/error summary and avoid echoing model-controlled content into workflow logs.
| echo "$CLEAN" \ | ||
| | jq --arg sha "${{ github.event.pull_request.head.sha }}" \ | ||
| '{commit_id: $sha, event: .event, body: .body, comments: (.comments // [])}' \ | ||
| > "$PAYLOAD" |
There was a problem hiding this comment.
This forwards .comments without validating path, side, line, or body shape. A single hallucinated coordinate will still make the Reviews API return 422, so this needs schema validation or invalid-comment filtering before posting.
Summary
Phase 2 of the codex-reviewer parity work. Stacked on #76.
Until now Codex was posting a single review with all findings concatenated into the body. Claude (via the subagent architecture) anchors each finding to the specific diff line in the Files Changed tab. This change brings Codex to format parity:
event,body, and an array ofcomments(path/line/side/bodyper finding). Schema is documented inline so the model has no ambiguity.jq, builds the Reviews API payload injq(so review text never round-trips through shell quoting), and posts viagh api --input <file>. We deliberately do NOT use-f event=...flags becausegh api --inputroutes those into the query string, not the body — codex itself flagged that exact mistake in PR ci: add Codex reviewer and migrate Claude review to subagent architecture #61's review of the slash-command migration.```jsonfences before parsing (codex sometimes wraps JSON in a fence despite the explicit instruction); validate the event value and fail loudly if it's outsideAPPROVE/COMMENT/REQUEST_CHANGES.Once this lands, Codex reviews show up inline in the Files Changed tab next to the relevant lines, exactly like Claude's.
Test plan
.github/workflows/codex-code-review.yml(or summary-only if Codex thinks the change is clean)Stacking note
Base branch is
ci/codex-app-token(PR #76). Once #76 merges, this PR's base auto-rebases tomainand CI re-runs.🤖 Generated with Claude Code