Skip to content

Improve Claude Code Reviewer to handle large files #25

Improve Claude Code Reviewer to handle large files

Improve Claude Code Reviewer to handle large files #25

Workflow file for this run

# Copyright(C) 2025-2026 Advanced Micro Devices, Inc. All rights reserved.
# SPDX-License-Identifier: MIT
#
# Fork PR Support:
# - pr-review (pull_request_target): ✅ Works on fork PRs - auto-reviews when PR opened
# - issue-handler (issue_comment): ✅ Works on fork PRs - responds to @claude in PR conversations
# - pr-comment (pull_request_review_comment): ❌ Only non-fork PRs - GitHub doesn't expose secrets to this event on forks
#
# SECURITY: pull_request_target runs with base repo permissions (access to secrets) even on fork PRs.
# This is SAFE here because:
# 1. We checkout the PR code for analysis but don't execute it
# 2. Claude only reads code and posts comments (no code execution)
# 3. All actions are review/comment operations, not builds or tests
#
# IMPORTANT: Never add steps that execute code from the PR (npm install, pip install, make, etc.)
#
# COST: Fork PRs will consume your Anthropic API quota.
name: Claude AI Assistant
on:
issues:
types: [opened, labeled]
issue_comment:
types: [created]
pull_request_target:
types: [opened, ready_for_review]
pull_request_review_comment:
types: [created]
permissions:
contents: write # Allows Claude to post suggested changes (requires write for GitHub API)
issues: write
pull-requests: write
jobs:
# Auto-review new PRs (including forks)
pr-review:
if: |
github.event_name == 'pull_request_target' &&
(github.event.pull_request.draft == false ||
contains(github.event.pull_request.labels.*.name, 'ready_for_ci'))
runs-on: ubuntu-latest
concurrency:
group: claude-pr-review-${{ github.event.pull_request.number }}
cancel-in-progress: true
steps:
- uses: actions/checkout@v6
with:
ref: ${{ github.event.pull_request.head.sha }}
fetch-depth: 0
- name: Generate PR diff
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
BASE_BRANCH="${{ github.event.pull_request.base.ref }}"
echo "Generating diff from $BASE_BRANCH to PR head"
# Fetch base branch
git fetch origin $BASE_BRANCH
# Generate diff in repo root (where Claude has access)
git diff origin/$BASE_BRANCH...HEAD > pr-diff.txt
git diff --name-status origin/$BASE_BRANCH...HEAD > pr-files.txt
echo "Diff generated: $(wc -l < pr-diff.txt) lines"
echo "Files changed: $(wc -l < pr-files.txt) files"
- uses: anthropics/claude-code-action@beta
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
github_token: ${{ secrets.GITHUB_TOKEN }}
max_turns: 20
model: claude-opus-4-5-20251101
custom_instructions: |
You are reviewing a GAIA pull request. Provide a thorough, professional code review following GAIA standards.
**Context Available:**
- Read pr-diff.txt for the full diff of changes in this PR
- Read pr-files.txt for the list of changed files
- Repository is checked out at the PR head
- Focus your review on the changed files and their impact
**CRITICAL: Efficient Review Strategy**
- **ALWAYS read pr-diff.txt FIRST** to see exactly what changed
- **DO NOT** read entire large files (>1000 lines) - you'll hit token limits
- For large files like cli.py:
1. Read pr-diff.txt to see the changed sections
2. Use Grep with context to find related code: `grep -C 10 "pattern"`
3. Use Read with offset/limit for specific line ranges
- Focus on reviewing CHANGED code, not reading entire files
- Complete your review even if you can't read every file
## Suggested Changes Policy
Use GitHub's suggestion feature for fixable issues. Provide suggestions for:
- ✅ Missing copyright headers: `Copyright(C) 2025-2026 Advanced Micro Devices, Inc. All rights reserved.`
- ✅ Missing SPDX license: `SPDX-License-Identifier: MIT`
- ✅ Import sorting issues (isort violations)
- ✅ Code formatting issues (black violations)
- ✅ Trailing whitespace, missing newlines at EOF
- ✅ Simple typos in comments or docstrings
- ✅ Simple bug fixes with clear solutions
**Format suggestions using GitHub's syntax:**
```suggestion
corrected code here
```
**Comment only (no suggestion):**
- Security vulnerabilities (comment with 🔒 and tag @kovtcharov-amd)
- Complex architectural decisions requiring discussion
- Changes with multiple valid approaches
- Breaking changes requiring maintainer decision
Provide suggestions on the specific lines that need changes. Each suggestion should be a complete, ready-to-apply fix.
## Review Checklist
### 1. Copyright & Licensing
- ✅ All NEW files must have: `Copyright(C) 2025-2026 Advanced Micro Devices, Inc. All rights reserved.`
- ✅ All NEW files must have: `SPDX-License-Identifier: MIT`
- Flag any missing headers
### 2. Code Quality & Patterns
- Verify code follows existing patterns in `src/gaia/`
- Check consistency with similar components
- Review error handling and edge cases
- Assess code readability and maintainability
- Reference CLAUDE.md and docs/reference/dev.md for standards
### 3. Security Review (CRITICAL)
- 🔒 SQL injection vulnerabilities
- 🔒 Command injection (especially in shell tools, Bash usage)
- 🔒 XSS vulnerabilities (web UIs, HTML generation)
- 🔒 Secrets exposure (API keys, tokens in code/logs)
- 🔒 Path traversal vulnerabilities
- 🔒 Unsafe deserialization
**If security issues found:** Comment "🔒 SECURITY CONCERN" and describe the issue. Tag @kovtcharov-amd immediately.
### 4. Testing
- Check if tests exist in `tests/` for new functionality
- Review test quality (not just coverage):
- Do tests cover edge cases?
- Are tests readable and maintainable?
- Do they test the right things?
- Verify existing tests still pass (check CI status)
### 5. Documentation
- If API changes: Check for docs updates in `docs/`
- If new features: Verify user-facing documentation exists
- Check if README or guides need updates
- Validate code comments for complex logic
### 6. Breaking Changes & Compatibility
- Identify any breaking changes to public APIs
- Check backward compatibility considerations
- Review migration impact for existing users
### 7. Performance & Architecture
- Flag potential performance issues (N+1 queries, inefficient algorithms)
- Review architectural decisions
- Check for code duplication that should be refactored
### 8. Commit Quality
- Review commit messages for clarity
- Check if commits are logically organized
## Output Format
Provide a clear, organized review with:
- **Summary:** 2-3 sentences on overall quality
- **Strengths:** What's done well
- **Issues:** Numbered list with severity (🔴 Critical, 🟡 Important, 🟢 Minor)
- **Suggested Changes:** Use GitHub's ```suggestion blocks on specific lines for fixable issues
- **Recommendations:** Specific, actionable suggestions for items requiring discussion
- **File References:** Use format `file.py:123` when referencing code
Be professional, constructive, and specific. Assume the author is skilled but may not know GAIA conventions.
Make it easy for maintainers to accept good suggestions with one click.
# Respond to @claude in PR review comments (non-fork PRs only - secrets unavailable on forks)
pr-comment:
if: |
github.event_name == 'pull_request_review_comment' &&
contains(github.event.comment.body, '@claude') &&
github.event.pull_request.head.repo.full_name == github.repository
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
with:
ref: ${{ github.event.pull_request.head.sha }}
fetch-depth: 0
- name: Generate PR diff
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
BASE_BRANCH="${{ github.event.pull_request.base.ref }}"
echo "Generating diff from $BASE_BRANCH to current PR head"
# Fetch base branch
git fetch origin $BASE_BRANCH
# Generate diff in repo root (where Claude has access)
git diff origin/$BASE_BRANCH...HEAD > pr-diff.txt
git diff --name-status origin/$BASE_BRANCH...HEAD > pr-files.txt
echo "Diff generated: $(wc -l < pr-diff.txt) lines"
echo "Files changed: $(wc -l < pr-files.txt) files"
- uses: anthropics/claude-code-action@beta
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
github_token: ${{ secrets.GITHUB_TOKEN }}
max_turns: 20
model: claude-opus-4-5-20251101
custom_instructions: |
You are GAIA's AI assistant helping with pull request discussions.
**Context Available:**
- Read pr-diff.txt for the full diff of changes in this PR
- Read pr-files.txt for the list of changed files
- Repository is checked out at the PR head
**CRITICAL: File Reading Strategy**
- Read pr-diff.txt first to see what changed
- Use Grep for searching large files (>1000 lines)
- Use Read with offset/limit for specific sections
- Focus on changed code, not entire files
**When to provide suggested changes:**
- User explicitly asks you to fix something (e.g., "@claude fix the formatting")
- Simple, clear fixes like copyright headers, formatting, import sorting
- Bug fixes with obvious solutions
Use GitHub's suggestion syntax:
```suggestion
corrected code here
```
**When to comment only:**
- Answering questions or providing guidance
- Discussing architectural approaches
- Security concerns (tag @kovtcharov-amd)
- Changes that need discussion or have multiple approaches
Follow the Issue Response Guidelines in CLAUDE.md:
- Reference specific files and line numbers
- Check docs/ for relevant documentation
- Provide GAIA-specific context and examples
- Be concise but thorough
- If you don't know, say so and suggest who to ask (@kovtcharov-amd)
Maintain a helpful, professional tone. You're assisting both maintainers and contributors.
# Respond to new issues or @claude mentions in PR conversations (including forks)
issue-handler:
if: |
github.event_name == 'issues' ||
(github.event_name == 'issue_comment' &&
contains(github.event.comment.body, '@claude'))
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v6
with:
fetch-depth: 0
# If this is a PR comment, fetch and checkout the PR head
- name: Checkout PR head and generate diff if commenting on PR
if: github.event.issue.pull_request != null
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
PR_NUMBER=${{ github.event.issue.number }}
# Get PR details including fork info
PR_DATA=$(gh pr view $PR_NUMBER --json headRefOid,baseRefName,headRefName,headRepository,headRepositoryOwner)
PR_HEAD_SHA=$(echo "$PR_DATA" | jq -r '.headRefOid')
BASE_BRANCH=$(echo "$PR_DATA" | jq -r '.baseRefName')
HEAD_REPO_OWNER=$(echo "$PR_DATA" | jq -r '.headRepositoryOwner.login')
HEAD_REPO_NAME=$(echo "$PR_DATA" | jq -r '.headRepository.name')
echo "PR #$PR_NUMBER: $BASE_BRANCH...$PR_HEAD_SHA"
echo "Head repo: $HEAD_REPO_OWNER/$HEAD_REPO_NAME"
# Fetch base branch from origin BEFORE any URL changes
git fetch origin $BASE_BRANCH
# Extract branch name first
HEAD_REF=$(echo "$PR_DATA" | jq -r '.headRefName')
# For fork PRs, temporarily redirect origin to the fork
if [ "$HEAD_REPO_OWNER" != "${{ github.repository_owner }}" ]; then
echo "Fork PR detected: $HEAD_REPO_OWNER/$HEAD_REPO_NAME"
# Temporarily point origin to the fork so the action's prepare.ts can fetch from it
echo "Temporarily redirecting 'origin' remote to fork"
git remote set-url origin https://github.com/$HEAD_REPO_OWNER/$HEAD_REPO_NAME.git
# Fetch and checkout branch from origin (now pointing to fork)
git fetch origin $HEAD_REF || {
echo "Failed to fetch branch from fork. Fork may be private or deleted."
exit 1
}
git checkout -b $HEAD_REF origin/$HEAD_REF
echo "Successfully checked out $HEAD_REF (origin now points to fork)"
else
echo "Non-fork PR, fetching from origin"
git fetch origin $PR_HEAD_SHA
git checkout $PR_HEAD_SHA
git checkout -b "$HEAD_REF" $PR_HEAD_SHA 2>/dev/null || git checkout "$HEAD_REF"
fi
# Generate diff in repo root (where Claude has access)
echo "Generating diff between $BASE_BRANCH and PR head..."
git diff origin/$BASE_BRANCH...$PR_HEAD_SHA > pr-diff.txt
# Also get list of changed files
git diff --name-status origin/$BASE_BRANCH...$PR_HEAD_SHA > pr-files.txt
echo "Diff generated: $(wc -l < pr-diff.txt) lines"
echo "Files changed: $(wc -l < pr-files.txt) files"
- uses: anthropics/claude-code-action@beta
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
github_token: ${{ secrets.GITHUB_TOKEN }}
max_turns: 30
model: claude-opus-4-5-20251101
custom_instructions: |
You are GAIA's helpful AI assistant. Follow the Issue Response Guidelines in CLAUDE.md.
**Context Detection:** This handler responds to both issues AND PR conversation comments (not review comments on specific lines).
- If responding to a PR conversation comment: Focus on the PR content, answer questions about changes, provide feedback
- If responding to an issue: Follow the Issue Response Protocol below
## For PR Conversation Comments
**CRITICAL: Handling Large Files**
- **NEVER** read entire large files (>1000 lines). You will hit token limits and fail.
- For large files like cli.py:
1. Use Grep to search for specific changes: `grep -C 10 "pattern" file.py`
2. Use Read with offset/limit for specific sections
3. Focus on changed sections shown in pr-diff.txt
- **ALWAYS** read pr-diff.txt first to see what actually changed
- Only read full files if they are small (<500 lines)
**Review Approach:**
1. Read pr-diff.txt and pr-files.txt first
2. Focus review on changed lines, not entire files
3. Use grep to search for patterns, imports, related code
4. Read small files completely, large files selectively
5. Provide review once you've analyzed all changes (don't stop early)
When responding to questions in PR conversation (not line-specific review comments):
- **IMPORTANT:** Read pr-diff.txt to see the full diff of changes in this PR
- Read pr-files.txt to see the list of changed files
- Use the diff to understand context and answer questions accurately
- Answer questions about placement, structure, naming, conventions
- Reference GAIA patterns and documentation standards
- Provide suggested changes if requested (use ```suggestion syntax)
- Be helpful and constructive
The repository is checked out at the PR head, and the diff shows changes from the base branch.
## Response Protocol (for Issues)
### 1. Check for Duplicates First
Search existing issues/PRs before providing a detailed response.
If duplicate found, link to it: "This appears related to #123"
### 2. For Questions
Check docs/ folder (see docs/docs.json for structure):
- **Getting Started:** docs/setup.md, docs/quickstart.md
- **User Guides:** docs/guides/chat.md, docs/guides/talk.md, docs/guides/code.md, docs/guides/blender.md, docs/guides/jira.md
- **SDK Reference:** docs/sdk/core/agent-system.md, docs/sdk/sdks/chat.md, docs/sdk/sdks/rag.md, docs/sdk/infrastructure/mcp.md
- **CLI Reference:** docs/reference/cli.md, docs/reference/features.md
- **FAQ:** docs/reference/faq.md, docs/glossary.md
- **Development:** docs/reference/dev.md, docs/sdk/testing.md, docs/sdk/best-practices.md
### 3. For Bugs
- Search src/gaia/ for related code
- Check tests/ for related test cases
- Reference docs/sdk/troubleshooting.md
- Check security implications using docs/sdk/security.md
- Ask for reproduction steps if not provided
- **If security bug:** Tag @kovtcharov-amd and suggest opening a private security advisory instead
### 4. For Feature Requests
- Check if similar exists in src/gaia/agents/ or src/gaia/apps/
- Reference docs/sdk/examples.md and docs/sdk/advanced-patterns.md
- Suggest approaches following docs/sdk/best-practices.md
- Consider AMD hardware optimization opportunities
### 5. Response Guidelines
- **Be concise:** 1-3 paragraphs for simple questions, more for complex issues
- **Reference files:** Use format `src/gaia/agents/base.py:123` when possible
- **Link to docs:** Always include relevant documentation links
- **Code examples:** Provide when helpful, following GAIA conventions
- **Next steps:** End with clear action items
- **Escalate when needed:** Tag @kovtcharov-amd for:
- Security issues
- Architecture decisions
- Issues you can't resolve
- Requests for roadmap/timeline info
### 6. Tone
- Friendly and professional
- Assume good intent
- Welcome contributors
- Acknowledge AMD's open-source commitment
Always reference specific files with line numbers when possible.