Skip to content

Conversation

@MikaelMayer
Copy link
Member

@MikaelMayer MikaelMayer commented Jul 31, 2025

Currently, reviewing long PRs, like PRs that changes multiple expect files, is discouraging.
This PR automates the reviewing process by delegating some portion of reviewing, i.e. low-level reviewing, to an agent such as Amazon Q or Claude Code

What was changed?

I added a script ./review.sh that requires a PR number and automatically performs a review of every change to see if they conform to the PR description.
It clones the repository, lists all the files, and input maximum 500 lines of diff into a prompt that asks if the diff is consistent with what we could have expected from the PR description.

Sample output while the script is running live.

=========================================
Current review concerns:
  (No issues found so far)

Progress:
  21/346 files reviewed so far
  Currently reviewing: docs/DafnyRef/UserGuide.5.expect
=========================================

How has this been tested?

Exiting PR

I ran this script on #3324 to ensure the PR description is consistent with the changes. It already detected the following issues:

  • The PR description omitted to mention that the error messages could break integration with tools depending on the error messages
  • Two files were added that shouldn't have been committed
  • One document line was not up to date.
  • Capitalization of Dafny was inconsistent in the doc.

Self-test: Positive case

I also tested this script on this PR itself. Here was the output:

========================================
Current review concerns:
  (No issues found so far)

Progress:
  0/1 files reviewed so far
  Currently reviewing: Scripts/review.sh
=========================================
=========================================
📊 FINAL CODE REVIEW SUMMARY
=========================================

Files processed: 1
Files analyzed: 1
Files skipped: 0
Files passed: 1
Files failed: 0

✅ All analyzed files passed code review!
✅ PR #6318 looks good to merge
🎉 Code review complete!

Self-test: Negative case:

When deleting a line in the Makefile, we immediately get meaningful feedback

=========================================
Current review concerns:
  Makefile: Makefile: DIR variable removed but still referenced in test-dafny and tests-verbose targets, causing undefined variable errors

Progress:
  1/2 files reviewed so far
  Currently reviewing: Scripts/review.sh
=========================================

and then when the analysis is finished, we see

=========================================
📊 FINAL CODE REVIEW SUMMARY
=========================================

Files processed: 2
Files analyzed: 2
Files skipped: 0
Files passed: 1
Files failed: 1

❌ Issues found in 1 file(s):

  Makefile: Makefile: DIR variable removed but still referenced in test-dafny and tests-verbose targets, causing undefined variable errors

🧹 Cleaning up successful review files...
💡 Review output files for failed files are available in the current directory
❌ Consider addressing these concerns before merging
🎉 Code review complete!

⚠️  IMPORTANT: This AI review is a tool to assist human reviewers.
   You are ultimately responsible for the actual quality of the review.
   Just because the AI doesn't flag anything doesn't mean the PR is
   actually free of issues. Always apply your own judgment and expertise.

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.

MikaelMayer and others added 13 commits July 31, 2025 11:27
Lines 9-18: Added Dafny Scripts folder detection

Lines 20-25: Auto-detect PR number from current branch pattern

Line 85: Added MAX_RETRIES=5 configuration

Lines 91-110: Modified repository handling for Dafny detection

Lines 111-114: Added PR title fetching and display

Lines 340-342: Added PR title to the AI prompt

Lines 344-375: Implemented retry logic with proper error handling

Lines 376-380: Enhanced error messages with retry count

Lines 540-570: Conditional cleanup - only delete temp folder on success, preserve on failure
Copy link
Member

@keyboardDrummer keyboardDrummer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you use bash instead of C# to define this script? I think it would be easier to maintain if it was C#, because the repository already requires reading C# and debugging C# is easier than bash.

@MikaelMayer
Copy link
Member Author

MikaelMayer commented Aug 18, 2025

Why did you use bash instead of C# to define this script? I think it would be easier to maintain if it was C#, because the repository already requires reading C# and debugging C# is easier than bash.

I asked Q to develop me a script to do that, so this script itself is an iteration of an interaction with Q. I haven't asked specifically for bash.
On the other hand, I think it is better to have something that does not depend on C# because this script is not Dafny specific and could be reused (and possibly evolving differently) for other projects. without having to install dotnet.

@atomb
Copy link
Member

atomb commented Aug 27, 2025

FWIW, I found it helpful for reviewing #3324 (and finding things to fix in it!), which would be a huge effort otherwise.

@MikaelMayer MikaelMayer enabled auto-merge (squash) August 29, 2025 12:54
@MikaelMayer MikaelMayer merged commit 2c01b78 into master Sep 2, 2025
45 of 47 checks passed
@MikaelMayer MikaelMayer deleted the chore-automated-pr-review branch September 2, 2025 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants