-
Notifications
You must be signed in to change notification settings - Fork 4.6k
323 lines (262 loc) · 13.3 KB
/
goose-pr-reviewer.yml
File metadata and controls
323 lines (262 loc) · 13.3 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
# goose PR Reviewer
#
# Automated PR review using goose AI agent.
#
# Trigger: Comment "/goose [optional instructions]" on a PR (OWNER/MEMBER only)
#
# Examples:
# /goose
# /goose focus on security implications
# /goose check error handling in src/auth/
#
# Required Secrets:
# - ANTHROPIC_API_KEY: API key for Anthropic
#
# Optional Variables:
# - GOOSE_PROVIDER: LLM provider (default: anthropic)
# - GOOSE_MODEL: Model name (default: claude-opus-4-5)
#
# Security:
# - PR content could prompt-inject the agent; only trigger on PRs you trust.
# - Do not add workflow_dispatch: API calls fetch mutable data, enabling TOCTOU attacks.
name: goose PR Reviewer
on:
issue_comment:
types: [created]
env:
GOOSE_RECIPE: |
version: "1.0.0"
title: "Review Pull Request"
description: "Review PR #${PR_NUMBER}"
extensions:
- type: builtin
name: developer
- type: platform
name: todo
instructions: |
You are a code reviewer. Evaluate code, don't implement changes. Be constructive, specific, and verify claims with evidence.
## Issue Categories
- 🔴 BLOCKING: Must fix. Requires HIGH confidence + code evidence.
- 🟡 WARNING: Should fix. MEDIUM+ confidence.
- 🟢 SUGGESTION: Nice to have. Can be speculative if labeled.
- ✅ HIGHLIGHT: Good practices to acknowledge.
## Core Review Lens: "Succeed Fast" Detection
LLM code often compiles quickly while avoiding proper design. Ask these questions:
**Necessity**: Do we need this?
- New function? Could an existing one take an extra parameter?
- New struct? Search `rg "struct Similar"` - does one exist?
- New dependency? Many tools (gh, python) are pre-installed.
**Types & Options**: Is the type system being used correctly?
- Option<T>: Why optional? If .unwrap() is nearby, it should be required.
- Option<bool>: Rarely needed. Consider if None vs Some(false) distinction matters.
- Manual JSON parsing: Use typed structs, not field extraction.
- Many casts/assertions: The types are probably wrong.
- Use builders: Prefer `Foo::builder()` over manual struct construction when available.
- Generated types: Use API/SDK-provided types. Don't redefine what's already generated.
- Path types: Use PathBuf for paths, not String. Use centralized path utilities.
**Duplication & State**: Is there one source of truth?
- Similar code nearby? Factor it out.
- Shadow state (settings in multiple places)? Consolidate.
- Singleton caching config? What if config changes?
**Error Handling**: Are errors handled or hidden?
- .ok(), .unwrap_or_default(), catch-and-log: Should this propagate?
## Code Hygiene
- AI-generated artifacts: Verbose comments like "// Helper function to..." may be LLM leftovers. Delete if obvious.
- Comments restating code: Delete. "Code speaks for itself."
- Comments that lie: Dangerous. Fix or delete.
- TODOs without owners: Create issue or delete.
- Tests setting env vars: `rg "env::set_var" -g "*test*"` = flaky tests.
- Tests not testing behavior: "What bug would this catch?"
Ref: https://engineering.block.xyz/blog/the-high-cost-of-free-testing
## Security (especially workflows)
- TOCTOU: Use event payload, not API calls.
- Heredoc injection: User content may contain delimiter.
- Feature auto-enable: Must respect user toggles.
## Architectural Smell (be specific about why)
- Brittle: Too many special cases, hardcoded assumptions.
- Wrong layer: Routes should be thin. Clients shouldn't know implementation details.
- Not independent: Multiple booleans → should be enum.
- Cumbersome: Simpler approach exists? Extend existing APIs.
- Naming confusion: Similar functions should return similarly-named types (not FooResult vs ResultFoo).
- Scope creep: Changes unrelated to PR's stated purpose.
## Anti-hallucination
- Search with rg before claiming something is "missing"
- Before claiming UI/frontend changes are needed, trace the actual data flow
- Say "I couldn't verify" not "this is wrong"
- 2 verified issues are better than 10 speculative ones
## Language Note
These guidelines are primarily Rust-focused. For TypeScript/frontend changes (ui/desktop/),
apply the same rigor with language-appropriate patterns (e.g., discriminated unions,
optional chaining, React key uniqueness). Review all code equally thoroughly.
If the approach is fundamentally wrong, reject it. Don't polish bad code.
prompt: |
Review PR #${PR_NUMBER}: ${PR_TITLE}
The PR metadata is saved at /tmp/pr.json
The PR diff is saved at /tmp/pr.diff
The PR branch is already checked out. You are in the repository directory.
Use the codebase to understand context - explore with analyze, rg, and file reads.
Reviewer instructions from trigger:
${REVIEW_INSTRUCTIONS}
FIRST ACTION: Call todo_write with this entire checklist. Your memory degrades - the TODO is your only reliable memory. Update it frequently.
## Budget Guidance
You have 15 minutes. Allocate effort wisely:
- Phase 1 (Understand): ~20% - Don't over-explore, get the gist
- Phase 2 (Evaluate): ~30% - This is where design issues surface
- Phase 3 (Verify): ~30% - Only if approach is sound, skip if fundamentally flawed
- Phase 4 (Report): ~20% - Prioritize and write concisely
For large diffs (>500 lines): Focus on architecture over line-by-line review.
Aim for 3-7 high-quality findings, not exhaustive coverage.
## PR Understanding
- Intent: [fill after Phase 1]
- Approach: [fill after Phase 1]
- Evaluation: [fill after Phase 2 - is this the right approach?]
## Phase 1: Understand
- [ ] Read /tmp/pr.json for PR description and context
- [ ] Read /tmp/pr.diff for the actual changes
- [ ] Read AGENTS.md if it exists
- [ ] Note reviewer instructions: ${REVIEW_INSTRUCTIONS}
- [ ] Summarize: What is this PR trying to accomplish?
## Phase 2: Evaluate (Empathize, then Challenge)
First, empathize with the author's choices:
- [ ] Trace the data/control flow through the changes
- [ ] Why did the author choose this approach?
- [ ] What constraints were they working under?
Then, challenge the approach:
- [ ] Architectural fit: Does this fit naturally, or is there awkward "pretending"?
- [ ] Alternatives: Could existing APIs/patterns be extended instead of adding new code?
- [ ] Consistency: Do names/types match similar code? Search with: rg "similar_name" --type rust
- [ ] Necessity: Is all this code needed? Could parameters/fields be removed?
- [ ] State and intent: If this enables/disables features, does it respect user preferences?
- [ ] Simpler path: Is there a way to achieve this with less code?
If fundamental design issues are found, note them and consider skipping detailed verification.
## Phase 3: Verify (only if approach is sound)
- [ ] Review for correctness and logic errors
- [ ] Check for security vulnerabilities
- [ ] Assess error handling
- [ ] Look for performance issues
- [ ] Verify edge cases are handled
- [ ] For each issue, add to Issues Found below with confidence level
Note: Do NOT run cargo check, cargo test, cargo fmt, or other build commands.
CI pipelines already validate those automatically. Focus on code review only.
## Phase 4: Report
- [ ] Prioritize findings: 🔴 Blocking > 🟡 Warning > 🟢 Suggestion
- [ ] Verify all 🔴 BLOCKING issues have HIGH confidence + code evidence
- [ ] Write review to /tmp/pr_review.md
- [ ] Be concise - quality over quantity
## Issues Found
[Add issues here as you find them, format: [HIGH/MED/LOW] [file:line] description - evidence]
---
Format your review in /tmp/pr_review.md as:
**Summary**: [1-2 sentence overview of what this PR does and your assessment]
Only include sections below if they have content. Omit empty sections entirely.
**🔴 Blocking Issues**
[Issues that must be fixed before merge. Each must have file:line and evidence.]
**🟡 Warnings**
[Issues that should be addressed]
**🟢 Suggestions**
[Optional improvements]
**✅ Highlights**
[Good practices observed in this PR]
---
*Review generated by goose*
Only write /tmp/pr_review.md when the review is complete.
Do NOT make any changes to the codebase.
permissions:
contents: read
pull-requests: write
concurrency:
group: goose-pr-review-${{ github.event.issue.number }}
cancel-in-progress: true
jobs:
review-pr:
if: |
github.event.issue.pull_request &&
startsWith(github.event.comment.body, '/goose') &&
contains(fromJSON('["OWNER", "MEMBER", "COLLABORATOR"]'), github.event.comment.author_association)
runs-on: ubuntu-latest
timeout-minutes: 15
container:
image: ghcr.io/aaif-goose/goose:latest
options: --user root
env:
GOOSE_PROVIDER: ${{ vars.GOOSE_PROVIDER || 'anthropic' }}
GOOSE_MODEL: ${{ vars.GOOSE_MODEL || 'claude-opus-4-5' }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
HOME: /tmp/goose-home
steps:
- name: Acknowledge trigger
run: |
curl -sL -X POST \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer ${{ secrets.GITHUB_TOKEN }}" \
"https://api.github.com/repos/${{ github.repository }}/issues/comments/${{ github.event.comment.id }}/reactions" \
-d '{"content":"eyes"}'
- name: Checkout PR
uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
with:
ref: refs/pull/${{ github.event.issue.number }}/head
fetch-depth: 1
- name: Install tools
run: |
apt-get update
apt-get install -y gettext curl ripgrep
curl -fsSL https://cli.github.com/packages/githubcli-archive-keyring.gpg | dd of=/usr/share/keyrings/githubcli-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" | tee /etc/apt/sources.list.d/github-cli.list > /dev/null
apt-get update
apt-get install -y gh
- name: Get PR diff
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
gh pr diff ${{ github.event.issue.number }} --repo ${{ github.repository }} > /tmp/pr.diff
- name: Extract review instructions
id: instructions
env:
COMMENT_BODY: ${{ github.event.comment.body }}
run: |
if [ -n "$COMMENT_BODY" ]; then
INSTRUCTIONS=$(echo "$COMMENT_BODY" | sed 's|^/goose||' | sed 's|^[[:space:]]*||' | sed 's|[[:space:]]*$||')
fi
if [ -z "$INSTRUCTIONS" ]; then
INSTRUCTIONS="No specific instructions - perform a general code review."
fi
# SECURITY: Use random delimiter to prevent injection if comment contains our delimiter
DELIMITER="EOF_$(openssl rand -hex 8)"
echo "instructions<<$DELIMITER" >> $GITHUB_OUTPUT
echo "$INSTRUCTIONS" >> $GITHUB_OUTPUT
echo "$DELIMITER" >> $GITHUB_OUTPUT
- name: Run goose review
id: goose
env:
PR_NUMBER: ${{ github.event.issue.number }}
PR_TITLE: ${{ github.event.issue.title }}
PR_BODY: ${{ github.event.issue.body }}
REVIEW_INSTRUCTIONS: ${{ steps.instructions.outputs.instructions }}
# SECURITY: Pass issue JSON via environment variable to avoid heredoc injection
# (GHSA-mm8p-57gq-3xj6) - user-controlled content could terminate heredoc early
ISSUE_JSON: ${{ toJson(github.event.issue) }}
run: |
mkdir -p $HOME/.local/share/goose/sessions
mkdir -p $HOME/.config/goose
git config --global --add safe.directory "$GITHUB_WORKSPACE"
# SECURITY: Use printf with env var instead of heredoc to prevent injection
printf '%s' "$ISSUE_JSON" > /tmp/pr.json
echo "$GOOSE_RECIPE" | envsubst '$PR_NUMBER $PR_TITLE $PR_BODY $REVIEW_INSTRUCTIONS' > /tmp/recipe.yaml
goose run --recipe /tmp/recipe.yaml
if [ -f /tmp/pr_review.md ]; then
echo "has_review=true" >> $GITHUB_OUTPUT
else
echo "has_review=false" >> $GITHUB_OUTPUT
fi
- name: Post review comment
if: steps.goose.outputs.has_review == 'true'
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
gh pr comment ${{ github.event.issue.number }} --repo ${{ github.repository }} --body-file /tmp/pr_review.md
- name: Post failure comment
if: failure() || steps.goose.outputs.has_review != 'true'
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
gh pr comment ${{ github.event.issue.number }} --repo ${{ github.repository }} --body "⚠️ goose PR review could not be completed. Check the [workflow run](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}) for details."