Chore: Review script to review long pull requests #6318

MikaelMayer · 2025-07-31T16:32:06Z

Currently, reviewing long PRs, like PRs that changes multiple expect files, is discouraging.
This PR automates the reviewing process by delegating some portion of reviewing, i.e. low-level reviewing, to an agent such as Amazon Q or Claude Code

What was changed?

I added a script ./review.sh that requires a PR number and automatically performs a review of every change to see if they conform to the PR description.
It clones the repository, lists all the files, and input maximum 500 lines of diff into a prompt that asks if the diff is consistent with what we could have expected from the PR description.

Sample output while the script is running live.

=========================================
Current review concerns:
  (No issues found so far)

Progress:
  21/346 files reviewed so far
  Currently reviewing: docs/DafnyRef/UserGuide.5.expect
=========================================

How has this been tested?

Exiting PR

I ran this script on #3324 to ensure the PR description is consistent with the changes. It already detected the following issues:

The PR description omitted to mention that the error messages could break integration with tools depending on the error messages
Two files were added that shouldn't have been committed
One document line was not up to date.
Capitalization of Dafny was inconsistent in the doc.

Self-test: Positive case

I also tested this script on this PR itself. Here was the output:

========================================
Current review concerns:
  (No issues found so far)

Progress:
  0/1 files reviewed so far
  Currently reviewing: Scripts/review.sh
=========================================

=========================================
📊 FINAL CODE REVIEW SUMMARY
=========================================

Files processed: 1
Files analyzed: 1
Files skipped: 0
Files passed: 1
Files failed: 0

✅ All analyzed files passed code review!
✅ PR #6318 looks good to merge
🎉 Code review complete!

Self-test: Negative case:

When deleting a line in the Makefile, we immediately get meaningful feedback

=========================================
Current review concerns:
  Makefile: Makefile: DIR variable removed but still referenced in test-dafny and tests-verbose targets, causing undefined variable errors

Progress:
  1/2 files reviewed so far
  Currently reviewing: Scripts/review.sh
=========================================

and then when the analysis is finished, we see

=========================================
📊 FINAL CODE REVIEW SUMMARY
=========================================

Files processed: 2
Files analyzed: 2
Files skipped: 0
Files passed: 1
Files failed: 1

❌ Issues found in 1 file(s):

  Makefile: Makefile: DIR variable removed but still referenced in test-dafny and tests-verbose targets, causing undefined variable errors

🧹 Cleaning up successful review files...
💡 Review output files for failed files are available in the current directory
❌ Consider addressing these concerns before merging
🎉 Code review complete!

⚠️  IMPORTANT: This AI review is a tool to assist human reviewers.
   You are ultimately responsible for the actual quality of the review.
   Just because the AI doesn't flag anything doesn't mean the PR is
   actually free of issues. Always apply your own judgment and expertise.

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.

Lines 9-18: Added Dafny Scripts folder detection Lines 20-25: Auto-detect PR number from current branch pattern Line 85: Added MAX_RETRIES=5 configuration Lines 91-110: Modified repository handling for Dafny detection Lines 111-114: Added PR title fetching and display Lines 340-342: Added PR title to the AI prompt Lines 344-375: Implemented retry logic with proper error handling Lines 376-380: Enhanced error messages with retry count Lines 540-570: Conditional cleanup - only delete temp folder on success, preserve on failure

keyboardDrummer

Why did you use bash instead of C# to define this script? I think it would be easier to maintain if it was C#, because the repository already requires reading C# and debugging C# is easier than bash.

MikaelMayer · 2025-08-18T23:22:55Z

Why did you use bash instead of C# to define this script? I think it would be easier to maintain if it was C#, because the repository already requires reading C# and debugging C# is easier than bash.

I asked Q to develop me a script to do that, so this script itself is an iteration of an interaction with Q. I haven't asked specifically for bash.
On the other hand, I think it is better to have something that does not depend on C# because this script is not Dafny specific and could be reused (and possibly evolving differently) for other projects. without having to install dotnet.

atomb · 2025-08-27T22:23:15Z

FWIW, I found it helpful for reviewing #3324 (and finding things to fix in it!), which would be a huge effort otherwise.

MikaelMayer and others added 13 commits July 31, 2025 11:27

Review script to review long pull requests

e6646e4

Update Makefile

2366cca

Update Makefile to be reverted

e9fc096

Update review.sh

5d17892

Restore original Makefile from master

f6a3e2a

Update review.sh

3b7312a

Update review.sh

6cec73f

Update review.sh

902db3c

Update review.sh

b3962e9

Update review.sh

6c8fb78

Merge branch 'master' into chore-automated-pr-review

535bbf5

Merge branch 'master' into chore-automated-pr-review

4456c99

keyboardDrummer reviewed Aug 18, 2025

View reviewed changes

atomb mentioned this pull request Aug 21, 2025

Fix: Wording of assertion failure closer to semantics #3324

Merged

Merge branch 'master' into chore-automated-pr-review

2e84c0c

RustanLeino approved these changes Aug 28, 2025

View reviewed changes

Merge branch 'master' into chore-automated-pr-review

07af12e

MikaelMayer enabled auto-merge (squash) August 29, 2025 12:54

MikaelMayer merged commit 2c01b78 into master Sep 2, 2025
45 of 47 checks passed

MikaelMayer deleted the chore-automated-pr-review branch September 2, 2025 13:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Chore: Review script to review long pull requests #6318

Chore: Review script to review long pull requests #6318

Uh oh!

MikaelMayer commented Jul 31, 2025 •

edited

Loading

Uh oh!

keyboardDrummer left a comment

Uh oh!

MikaelMayer commented Aug 18, 2025 •

edited

Loading

Uh oh!

atomb commented Aug 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Chore: Review script to review long pull requests #6318

Chore: Review script to review long pull requests #6318

Uh oh!

Conversation

MikaelMayer commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What was changed?

How has this been tested?

Exiting PR

Self-test: Positive case

Self-test: Negative case:

Uh oh!

keyboardDrummer left a comment

Choose a reason for hiding this comment

Uh oh!

MikaelMayer commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

atomb commented Aug 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MikaelMayer commented Jul 31, 2025 •

edited

Loading

MikaelMayer commented Aug 18, 2025 •

edited

Loading