Skip to content

Commit 1db8161

Browse files
PureWeenkubaflo
andauthored
Enhance PR agent: multi-model workflow, blocker handling, shared rules extraction (dotnet#33813)
<!-- Please let the below note in for people that find this PR --> > [!NOTE] > Are you waiting for the changes in this PR to be merged? > It would be very helpful if you could [test the resulting artifacts](https://github.com/dotnet/maui/wiki/Testing-PR-Builds) from this PR and let us know in a comment if this change resolves your issue. Thank you! ## Summary This PR significantly enhances the PR agent workflow with improvements across **two development phases**: ### Phase 1: Original Enhancements (Commits 1-8) 1. **Multi-model try-fix exploration** - Uses 5 different AI models to explore alternative solutions 2. **Environment blocker handling** - Strict rules to stop and ask user when environment issues occur 3. **Review automation script** - PowerShell script that invokes Copilot CLI directly for PR reviews 4. **Branch safety rules** - Prevents agent from switching branches during reviews 5. **Template formatting rules** - Exact template adherence for downstream script compatibility ### Phase 2: Consolidation & Simplification (Commit 9) After multi-model review (5 models: Claude Sonnet 4.5, Claude Opus 4.5, GPT-5.2, GPT-5.2-Codex, Gemini 3 Pro), the following improvements were made: 6. **Shared rules extraction** - Created `SHARED-RULES.md` to eliminate duplication across files 7. **Simplified git policy** - Agent never runs git commands; always assumes correct branch 8. **State file handling** - Changed "commit" to "save" (state files are gitignored) 9. **Reduced verbosity** - Compressed cross-pollination section, converted PLAN-TEMPLATE to checklist --- ## Commits ### 1. `80d7e412c2` - Update pr agent with multi-model try-fix workflow **Why:** The original PR agent only used a single model for exploring fixes. Different AI models have different strengths and may find solutions others miss. **Changes:** - Added Phase 4 multi-model workflow using 5 models: - `claude-sonnet-4.5`, `claude-opus-4.5`, `gpt-5.2`, `gpt-5.2-codex`, `gemini-3-pro-preview` - Cross-pollination loop: Share results between models to spark new ideas - Continue until all models confirm "no more approaches to explore" --- ### 2. `69cc6af403` - Address Copilot review suggestions **Why:** Initial PR review feedback suggested improvements. **Changes:** - Minor formatting and clarity improvements to pr.md --- ### 3. `fe55c3fd21` - Add rules for template formatting and skill script usage **Why:** Downstream scripts depend on exact regex patterns in state files. Agents were "improving" templates by adding attributes like `open` which broke parsing. **Changes:** - Added "Follow Templates EXACTLY" rule - no adding attributes, no improving formats - Added "Use Skills' Scripts" rule - run provided PowerShell scripts, don't bypass with manual commands --- ### 4. `debbee608e` - Add 'Stop on Environment Blockers' rule to PR agent **Why:** Agent was continuing through phases when environment issues (missing Appium, WinAppDriver errors) prevented completion, leading to incomplete reviews. **Changes:** - Added explicit blocker handling section to pr.md - Common blockers: Appium drivers, WinAppDriver, Xcode, emulators, port conflicts - Must STOP, report the blocker, and ask user how to proceed - Never mark phase as blocked and continue to next phase --- ### 5. `ad29f6a796` - Add PR review plan template and Review-PR.ps1 script **Why:** Need a reusable template for consistent PR reviews and a script to automate invocation. **Changes:** - Created `.github/agents/pr/PLAN-TEMPLATE.md` - Reusable 5-phase review plan - Created `.github/scripts/Review-PR.ps1` - Script to prepare environment and invoke Copilot CLI --- ### 6. `886ea2aa8e` - Improve blocker handling and fix Review-PR.ps1 for Copilot CLI **Why:** During PR dotnet#27300 review, agent spent 10+ tool calls troubleshooting WinAppDriver instead of stopping after first failure. **Changes:** - Added strict retry limits table: | Blocker Type | Max Retries | Action | |--------------|-------------|--------| | Server errors (500, timeout) | 0 | STOP immediately | | Missing tools | 1 install | STOP and ask | | Port conflicts | 1 kill | STOP and ask | | WinAppDriver errors | 0 | STOP immediately | - Added "What I tried" section to blocker report template - New prohibitions: Never spend more than 2-3 tool calls on same blocker --- ### 7. `d67da75e85` - Update Review-PR.ps1 to invoke Copilot CLI directly **Why:** Initially thought Copilot CLI was interactive-only. Discovered it supports `-i <prompt>` and `-p <prompt>` for programmatic invocation. **Changes:** - Script now invokes `copilot --agent pr -i "<prompt>"` directly - Validates both `gh` CLI and `copilot` CLI are installed - New `-NoInteractive` switch for `-p` mode (exits after completion) - Dry run mode shows exactly what would be invoked --- ### 8. `ed74c574a5` - Add 'Do NOT Switch Branches' rule to pr agent **Why:** During PR review testing, the pr agent ran `git checkout`, `git stash`, and other branch-switching commands, causing loss of local changes and confusion about which code was being reviewed. **Changes:** - Added explicit "Do NOT Switch Branches" rule to both pr.md and PLAN-TEMPLATE.md - Forbidden commands: `git checkout`, `git switch`, `gh pr checkout`, `git stash` - Agent must work on current branch as-is, using `git diff` or `gh pr diff` to see PR changes - Fixed variable expansion in Review-PR.ps1 prompt (double backticks for here-strings) --- ### 9. `632bfb7155` - Extract shared rules, simplify git policy, reduce duplication **Why:** Multi-model review (5 AI models) identified significant duplication (~200 lines) across files, conflicting "commit" terminology, and overly verbose sections. The git checkout prohibition also conflicted with workflow steps that mentioned git checkout. **Changes:** - **Created `SHARED-RULES.md`** (167 lines) - Single source of truth for: - Phase Completion Protocol - Follow Templates EXACTLY - No Direct Git Commands (absolute - agent never runs git) - Use Skills' Scripts - Stop on Environment Blockers (with retry limits) - Multi-Model Configuration (5 models list) - Platform Selection guidance - **Simplified git policy** - Agent is ALWAYS on correct branch, never runs git commands: - Removed git fetch/checkout from Phase 1 Pre-Flight - Phase 5: User handles commit/push/PR creation - Changed all "State file committed" → "State file saved" (gitignored files) - **Compressed content**: - Cross-pollination ASCII box: 51 → 20 lines - PLAN-TEMPLATE.md: Full docs → Pure checklist (226 → 112 lines) - pr.md: 662 → 535 lines (-19%) - post-gate.md: 403 → 302 lines (-25%) - Total reduction: 1291 → 1116 lines (-14%) - **Eliminated duplication**: - Blocker handling was in 3 files → now in SHARED-RULES.md only - Phase Completion Protocol was in 2 files → now in SHARED-RULES.md only - Model list was in 3 files → now in SHARED-RULES.md only --- ## Files Changed | File | Purpose | |------|---------| | `.github/agents/pr.md` | Main PR agent instructions (Phases 1-3) | | `.github/agents/pr/post-gate.md` | Phase 4-5 instructions (multi-model try-fix) | | `.github/agents/pr/PLAN-TEMPLATE.md` | **NEW** - Reusable 5-phase review checklist | | `.github/agents/pr/SHARED-RULES.md` | **NEW** - Extracted shared rules (single source of truth) | | `.github/scripts/Review-PR.ps1` | **NEW** - Script to invoke Copilot CLI for PR review | --- ## Usage ```powershell # Interactive mode (default) - stays open for follow-up pwsh .github/scripts/Review-PR.ps1 -PRNumber 33687 # Non-interactive mode - exits when done pwsh .github/scripts/Review-PR.ps1 -PRNumber 33687 -NoInteractive # Specific platform pwsh .github/scripts/Review-PR.ps1 -PRNumber 33687 -Platform ios # Skip merge if already on branch pwsh .github/scripts/Review-PR.ps1 -PRNumber 33687 -SkipMerge # Dry run to preview pwsh .github/scripts/Review-PR.ps1 -PRNumber 33687 -DryRun ``` --- ## Multi-Model Validation The final changes (commit 9) were validated by 5 AI models: | Model | Verdict | Key Feedback | |-------|---------|--------------| | Claude Sonnet 4.5 | ✅ READY TO MERGE | "All git command instructions successfully removed" | | Claude Opus 4.5 | ✅ READY TO MERGE | "Excellent refactoring, no conflicting guidance" | | GPT-5.2 | ✅ READY TO MERGE | "Progressive disclosure maintained" | | GPT-5.2-Codex | ✅ READY TO MERGE | "No instructions to run git commands remain" | | Gemini 3 Pro | ✅ READY TO MERGE | "Agent is instructed to STOP and ask user for commits" | --- ## Testing Tested by reviewing PR dotnet#27300 (ScrollView ScrollToAsync fix): - Pre-Flight phase completed successfully - Tests phase verified test files exist - Gate phase encountered WinAppDriver blocker → agent correctly stopped and asked - Blocker handling rules validated through real-world usage - **Branch safety verified**: Agent stayed on branch instead of switching --------- Co-authored-by: Jakub Florkowski <kubaflo123@gmail.com>
1 parent 52f9fc2 commit 1db8161

7 files changed

Lines changed: 762 additions & 277 deletions

File tree

.github/agents/pr.md

Lines changed: 49 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -41,31 +41,23 @@ After Gate passes, read `.github/agents/pr/post-gate.md` for **Phases 4-5**.
4141

4242
---
4343

44-
## Phase Completion Protocol (CRITICAL)
44+
## 🚨 Critical Rules
4545

46-
**Before changing ANY phase status to ✅ COMPLETE:**
46+
**Read `.github/agents/pr/SHARED-RULES.md` for complete details on:**
47+
- Phase Completion Protocol (fill ALL pending fields before marking complete)
48+
- Follow Templates EXACTLY (no `open` attributes, no "improvements")
49+
- No Direct Git Commands (use `gh pr diff/view`, let scripts handle files)
50+
- Use Skills' Scripts (don't bypass with manual commands)
51+
- Stop on Environment Blockers (strict retry limits, report and ask user)
52+
- Multi-Model Configuration (5 models for Phase 4)
53+
- Platform Selection (must be affected AND available on host)
4754

48-
1. **Read the state file section** for the phase you're completing
49-
2. **Find ALL ⏳ PENDING and [PENDING] fields** in that section
50-
3. **Fill in every field** with actual content
51-
4. **Verify no pending markers remain** in your section
52-
5. **Commit the state file** with complete content
53-
6. **Then change status** to ✅ COMPLETE
55+
**Key points:**
56+
- ❌ Never run `git checkout`, `git switch`, `git stash`, `git reset` - agent is always on correct branch
57+
- ❌ Never continue after environment blocker - STOP and ask user
58+
- ❌ Never mark phase ✅ with [PENDING] fields remaining
5459

55-
**Rule:** Status ✅ means "documentation complete", not "I finished thinking about it"
56-
57-
---
58-
59-
### 🚨 CRITICAL: Phase 4 Always Uses `try-fix` Skill
60-
61-
**Even when a PR already has a fix**, Phase 4 requires running the `try-fix` skill to:
62-
1. **Independently explore alternative solutions** - Generate fix ideas WITHOUT looking at the PR's solution
63-
2. **Test alternatives empirically** - Actually implement and run tests, don't just theorize
64-
3. **Compare with PR's fix** - PR's fix is already validated by Gate; try-fix explores if there's something better
65-
66-
The PR's fix is NOT tested by try-fix (Gate already did that). try-fix generates and tests YOUR independent ideas.
67-
68-
This ensures independent analysis rather than rubber-stamping the PR.
60+
Phase 4 uses a 5-model exploration workflow. See `post-gate.md` for detailed instructions after Gate passes.
6961

7062
---
7163

@@ -233,7 +225,7 @@ This file:
233225
- Serves as your TODO list for all phases
234226
- Tracks progress if interrupted
235227
- Must exist before you start gathering context
236-
- **Always include when committing changes** (to `CustomAgentLogsTmp/PRState/`)
228+
- **Always include when saving changes** (to `CustomAgentLogsTmp/PRState/`)
237229
- **Phases 4-5 sections are added AFTER Gate passes** (see `pr/post-gate.md`)
238230

239231
**Then gather context and update the file as you go.**
@@ -242,11 +234,7 @@ This file:
242234

243235
**If starting from a PR:**
244236
```bash
245-
# Checkout the PR
246-
git fetch origin pull/XXXXX/head:pr-XXXXX
247-
git checkout pr-XXXXX
248-
249-
# Fetch PR metadata
237+
# Fetch PR metadata (agent is already on correct branch)
250238
gh pr view XXXXX --json title,body,url,author,labels,files
251239

252240
# Find and read linked issue
@@ -256,7 +244,6 @@ gh issue view ISSUE_NUMBER --json title,body,comments
256244

257245
**If starting from an Issue (no PR exists):**
258246
```bash
259-
# Stay on current branch - do NOT checkout anything
260247
# Fetch issue details directly
261248
gh issue view XXXXX --json title,body,comments,labels
262249
```
@@ -351,7 +338,7 @@ The test result will be updated to `✅ PASS (Gate)` after Gate passes.
351338
- [ ] Files Changed table populated (if PR exists)
352339
- [ ] PR Discussion Summary documented (if PR exists)
353340
- [ ] All [PENDING] placeholders replaced
354-
- [ ] State file committed
341+
- [ ] State file saved
355342

356343
---
357344

@@ -418,7 +405,7 @@ The script auto-detects mode based on git diff. If only test files changed, it v
418405
- [ ] Test file paths documented
419406
- [ ] "Tests verified to FAIL" note added
420407
- [ ] Test category identified
421-
- [ ] State file committed
408+
- [ ] State file saved
422409

423410
---
424411

@@ -441,13 +428,42 @@ Tests were already verified to FAIL in Phase 2. Gate is a confirmation step:
441428
**If starting from a PR (fix exists):**
442429
Use full verification mode - tests should FAIL without fix, PASS with fix.
443430

431+
### Platform Selection for Gate
432+
433+
**🚨 CRITICAL: Choose a platform that is BOTH affected by the bug AND available on the current host.**
434+
435+
**Step 1: Identify affected platforms** from Pre-Flight:
436+
- Check the "Platforms Affected" checkboxes in the state file
437+
- Check issue labels (e.g., `platform/iOS`, `platform/Android`)
438+
- Check which platform-specific files the PR modifies
439+
440+
**Step 2: Match to available platforms on current host:**
441+
442+
| Host OS | Available Platforms |
443+
|---------|---------------------|
444+
| Windows | Android, Windows |
445+
| macOS | Android, iOS, MacCatalyst |
446+
447+
**Step 3: Select the best match:**
448+
1. Pick a platform that IS affected by the bug
449+
2. That IS available on the current host
450+
3. Prefer the platform most directly impacted by the PR's code changes
451+
452+
**Example decisions:**
453+
- Bug affects iOS/Windows/MacCatalyst, host is Windows → Test on **Windows**
454+
- Bug affects iOS only, host is Windows → **STOP** - cannot test (ask user)
455+
- Bug affects Android only → Test on **Android** (works on any host)
456+
- Bug affects all platforms → Pick based on host (Windows on Windows, iOS on macOS)
457+
458+
**⚠️ Do NOT test on a platform that isn't affected by the bug** - the test will pass regardless of whether the fix works.
459+
444460
**🚨 MUST invoke as a task agent** to prevent command substitution:
445461

446462
```markdown
447463
Invoke the `task` agent with agent_type: "task" and this prompt:
448464

449465
"Invoke the verify-tests-fail-without-fix skill for this PR:
450-
- Platform: android (or ios)
466+
- Platform: [selected platform from Platform Selection above]
451467
- TestFilter: 'IssueXXXXX'
452468
- RequireFullVerification: true
453469

@@ -486,7 +502,7 @@ See `.github/skills/verify-tests-fail-without-fix/SKILL.md` for full skill docum
486502
- [ ] Result shows PASSED ✅ or FAILED ❌
487503
- [ ] Test behavior documented
488504
- [ ] Platform tested noted
489-
- [ ] State file committed
505+
- [ ] State file saved
490506

491507
---
492508

.github/agents/pr/PLAN-TEMPLATE.md

Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
# PR Review Plan Template
2+
3+
**Reusable checklist** for the 5-phase PR Agent workflow.
4+
5+
**Source documents:**
6+
- `.github/agents/pr.md` - Phases 1-3 (Pre-Flight, Tests, Gate)
7+
- `.github/agents/pr/post-gate.md` - Phases 4-5 (Fix, Report)
8+
- `.github/agents/pr/SHARED-RULES.md` - Critical rules (blockers, git, templates)
9+
10+
---
11+
12+
## 🚨 Critical Rules (Summary)
13+
14+
See `SHARED-RULES.md` for complete details. Key points:
15+
- **Environment Blockers**: STOP immediately, report, ask user (strict retry limits)
16+
- **No Git Commands**: Never checkout/switch branches - agent is always on correct branch
17+
- **Gate via Task Agent**: Never run inline (prevents fabrication)
18+
- **Multi-Model try-fix**: 5 models, SEQUENTIAL only
19+
- **Follow Templates**: No `open` attributes, no "improvements"
20+
21+
---
22+
23+
## Work Plan
24+
25+
### Phase 1: Pre-Flight
26+
- [ ] Create state file: `CustomAgentLogsTmp/PRState/pr-XXXXX.md`
27+
- [ ] Gather PR metadata (title, body, labels, author)
28+
- [ ] Fetch and read linked issue
29+
- [ ] Fetch PR comments and review feedback
30+
- [ ] Check for prior agent reviews (import and resume if found)
31+
- [ ] Document platforms affected
32+
- [ ] Classify changed files (fix vs test)
33+
- [ ] Document PR's fix approach in Fix Candidates table
34+
- [ ] Update state file: Pre-Flight → ✅ COMPLETE
35+
- [ ] Save state file
36+
37+
**Boundaries:** No code analysis, no fix opinions, no test running
38+
39+
### Phase 2: Tests
40+
- [ ] Check if PR includes UI tests
41+
- [ ] Verify tests follow `IssueXXXXX` naming convention
42+
- [ ] If tests exist: Verify they compile
43+
- [ ] If tests missing: Invoke `write-ui-tests` skill
44+
- [ ] Document test files in state file
45+
- [ ] Update state file: Tests → ✅ COMPLETE
46+
- [ ] Save state file
47+
48+
### Phase 3: Gate ⛔
49+
**🚨 Cannot continue if Gate fails**
50+
51+
- [ ] Select platform (must be affected AND available on host)
52+
- [ ] Invoke via **task agent** (NOT inline):
53+
```
54+
"Run verify-tests-fail-without-fix skill
55+
Platform: [X], TestFilter: 'IssueXXXXX', RequireFullVerification: true"
56+
```
57+
- [ ] ⛔ If environment blocker: STOP, report, ask user
58+
- [ ] Verify: Tests FAIL without fix, PASS with fix
59+
- [ ] If Gate fails: STOP, request test fixes
60+
- [ ] Update state file: Gate → ✅ PASSED
61+
- [ ] Save state file
62+
63+
### Phase 4: Fix 🔧
64+
*(Only if Gate ✅ PASSED)*
65+
66+
**Round 1: Run try-fix with each model (SEQUENTIAL)**
67+
- [ ] claude-sonnet-4.5
68+
- [ ] claude-opus-4.5
69+
- [ ] gpt-5.2
70+
- [ ] gpt-5.2-codex
71+
- [ ] gemini-3-pro-preview
72+
- [ ] ⛔ If blocker: STOP, report, ask user
73+
- [ ] Record: approach, result, files, failure analysis
74+
75+
**Round 2+: Cross-Pollination (MANDATORY)**
76+
- [ ] Invoke EACH model: "Any NEW fix ideas?"
77+
- [ ] Record responses in Cross-Pollination table
78+
- [ ] Run try-fix for new ideas (SEQUENTIAL)
79+
- [ ] Repeat until ALL 5 say "NO NEW IDEAS" (max 3 rounds)
80+
81+
**Completion:**
82+
- [ ] Cross-Pollination table has all 5 responses
83+
- [ ] Mark Exhausted: Yes
84+
- [ ] Compare passing candidates with PR's fix
85+
- [ ] Select best fix (results → simplicity → robustness)
86+
- [ ] Update state file: Fix → ✅ COMPLETE
87+
- [ ] Save state file
88+
89+
### Phase 5: Report 📋
90+
*(Only if Phases 1-4 complete)*
91+
92+
- [ ] Run `pr-finalize` skill
93+
- [ ] Generate review: root cause, candidates, recommendation
94+
- [ ] Post via `ai-summary-comment` skill
95+
- [ ] Update state file: Report → ✅ COMPLETE
96+
- [ ] Save final state file
97+
98+
---
99+
100+
## Quick Reference
101+
102+
| Phase | Key Action | Blocker Response |
103+
|-------|------------|------------------|
104+
| Pre-Flight | Create state file | N/A |
105+
| Tests | Verify/create tests | N/A |
106+
| Gate | Task agent → verify script | ⛔ STOP, report, ask |
107+
| Fix | Multi-model try-fix | ⛔ STOP, report, ask |
108+
| Report | Post via skill | ⛔ STOP, report, ask |
109+
110+
**State file:** `CustomAgentLogsTmp/PRState/pr-XXXXX.md`
111+
112+
**Never:** Mark BLOCKED and continue, claim success without tests, bypass scripts

0 commit comments

Comments
 (0)