Skip to content

Commit da3b0a0

Browse files
PureWeenCopilot
andauthored
Improve try-fix skill: add eval.yaml and fix prompt issues (dotnet#34807)
> [!NOTE] > Are you waiting for the changes in this PR to be merged? > It would be very helpful if you could <a href="https://github.com/dotnet/maui/wiki/Testing-PR-Builds">test the resulting artifacts</a> from this PR and let us know in a comment if this change resolves your issue. Thank you! ## Summary Improves the try-fix skill based on comprehensive evaluation (empirical + prompt analysis) and production data mining from 6 real agent-reviewed PRs. ### Changes 1. **Added tests/eval.yaml** -- 8 scenarios for empirical A/B validation (6 synthetic + 2 production-derived from PRs dotnet#33134, dotnet#32289) 2. **Fixed SKILL.md issues** -- context contradiction, hardcoded filename, iteration limits, error table, activation guard, root-cause warning, platform path warning, code fence rendering 3. **Used native spec features** -- expect_activation: false for negative trigger scenario ### Evaluation Results - **Isolated improvement**: +51.7% (skill works when activated) - **Dotnet Validator**: 7/10 KEEP - **Anthropic Evaluator**: 9/10 KEEP - **Consensus**: KEEP -- ready to merge --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 794a9fa commit da3b0a0

File tree

2 files changed

+220
-9
lines changed

2 files changed

+220
-9
lines changed

.github/skills/try-fix/SKILL.md

Lines changed: 29 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,23 @@ compatibility: Requires PowerShell, git, .NET MAUI build environment, Android/iO
88

99
Attempts ONE fix for a given problem. Receives all context upfront, tries a single approach, tests it, and reports what happened.
1010

11+
## Activation Guard
12+
13+
🚨 **This skill is ONLY for proposing and testing code fixes.** Do NOT activate for:
14+
- Code review requests ("review this PR", "check code quality")
15+
- PR summaries or descriptions ("what does this PR do?")
16+
- Test-only requests ("run tests", "check CI status")
17+
- General questions about code or architecture
18+
19+
If the prompt does not include a **problem to fix** and a **test command to verify**, this skill should not run.
20+
1121
## Core Principles
1222

13-
1. **Always run** - Never question whether to run. The invoker decides WHEN, you decide WHAT alternative to try
23+
1. **Always run once activated** - Never question whether to run. The invoker decides WHEN, you decide WHAT alternative to try
1424
2. **Single-shot** - Each invocation = ONE fix idea, tested, reported
1525
3. **Alternative-focused** - Always propose something DIFFERENT from existing fixes (review PR changes first)
1626
4. **Empirical** - Actually implement and test, don't just theorize
17-
5. **Context-driven** - All information provided upfront; don't search for additional context
27+
5. **Context-driven** - Work with what's provided and git history; don't search external sources
1828

1929
**Every invocation:** Review existing fixes → Think of DIFFERENT approach → Implement and test → Report results
2030

@@ -23,7 +33,7 @@ Attempts ONE fix for a given problem. Receives all context upfront, tries a sing
2333
🚨 **Try-fix runs MUST be executed ONE AT A TIME - NEVER in parallel.**
2434

2535
**Why:** Each try-fix run:
26-
- Modifies the same source files (SafeAreaExtensions.cs, etc.)
36+
- Modifies the same target source files
2737
- Uses the same device/emulator for testing
2838
- Runs EstablishBrokenBaseline.ps1 which reverts files to a known state
2939

@@ -147,6 +157,8 @@ The skill is complete when:
147157

148158
**Never stop due to:** Compile errors (fix them), infrastructure blame (debug your code), giving up too early.
149159

160+
> **Session limits:** Each try-fix *invocation* allows up to 3 compile/test iterations. The *calling orchestrator* controls how many invocations (attempts) to run per session (typically 4-5 as part of pr-review Phase 3).
161+
150162
---
151163

152164
## Workflow
@@ -182,7 +194,7 @@ The skill is complete when:
182194
- What files should be investigated?
183195
- Are there hints about what to try or avoid?
184196

185-
**Do NOT search for additional context.** Work with what's provided.
197+
**Do NOT search for external context.** Work with what's provided and the git history.
186198

187199
### Step 2: Establish Baseline (MANDATORY)
188200

@@ -208,6 +220,12 @@ Select-String -Path "$OUTPUT_DIR/baseline.log" -Pattern "Baseline established"
208220

209221
Read the target files to understand the code.
210222

223+
**Verify the platform code path before implementing.** Check which platform-specific file actually executes for the target scenario:
224+
- Files named `.iOS.cs` compile for both iOS AND MacCatalyst
225+
- Files named `.Android.cs` only compile for Android
226+
- Some platforms use Legacy implementations (e.g., iOS NavigationPage uses `NavigationPage.Legacy.cs`, not `MauiNavigationImpl`)
227+
If unsure which code path runs, check `AppHostBuilderExtensions` or handler registration to confirm.
228+
211229
**Key questions:**
212230
- What is the root cause of this bug?
213231
- Where should the fix go?
@@ -221,6 +239,10 @@ Based on your analysis and any provided hints, design a single fix approach:
221239
- What the change is
222240
- Why you think this will work
223241

242+
**"Different" means different ROOT CAUSE hypothesis, not just different code location.**
243+
- ❌ Bad: PR checks `adapter == null` in OnMeasure; you check `adapter == null` in OnLayout (same root cause assumption — just a different call site)
244+
- ✅ Good: PR checks `adapter == null`; you prevent disposal from happening during measure (different root cause hypothesis)
245+
224246
**If hints suggest specific approaches**, prioritize those.
225247

226248
**IMMEDIATELY create `approach.md`** in your output directory:
@@ -317,7 +339,7 @@ git diff | Set-Content "$OUTPUT_DIR/fix.diff"
317339
pwsh .github/scripts/EstablishBrokenBaseline.ps1 -Restore
318340
```
319341

320-
🚨 Do NOT use `git checkout HEAD -- .` or `git clean` to restore — use the script.
342+
🚨 Use `EstablishBrokenBaseline.ps1 -Restore` — not `git checkout`, `git restore`, or `git reset` (see Step 2 for why).
321343

322344
### Step 9: Report Results
323345

@@ -337,9 +359,7 @@ Provide structured output to the invoker:
337359
[Why it worked, or why it failed and what was learned]
338360

339361
**Diff:**
340-
```diff
341-
[The actual changes made]
342-
```
362+
(paste `git diff` output here)
343363

344364
**This Attempt's Status:** Done/NeedsRetry
345365
**Reasoning:** [Why this specific approach succeeded or failed]
@@ -355,7 +375,7 @@ Provide structured output to the invoker:
355375
| Test command fails to run | Report build/setup error with details |
356376
| Test times out | Report timeout, include partial output |
357377
| Can't determine fix approach | Report "no viable approach identified" with reasoning |
358-
| Git state unrecoverable | Run `git checkout HEAD -- .` and `git clean -fd` if needed |
378+
| Git state unrecoverable | Run `pwsh .github/scripts/EstablishBrokenBaseline.ps1 -Restore` (see Step 2/8) |
359379

360380
---
361381

Lines changed: 191 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,191 @@
1+
scenarios:
2+
- name: "Happy path: propose alternative fix with different approach"
3+
prompt: |
4+
The pr-review agent needs an alternative fix attempt for issue #54321.
5+
6+
The bug: CollectionView throws ObjectDisposedException on Android when the user navigates back
7+
from a page that contains a CollectionView. The current PR already tried adding a null check on
8+
the adapter inside OnMeasure() — that didn't fix it reliably.
9+
10+
Please try a different approach focused on lifecycle/disposal timing.
11+
12+
Test: pwsh .github/scripts/BuildAndRunHostApp.ps1 -Platform android -TestFilter "Issue54321"
13+
Files to look at: src/Controls/src/Core/Handlers/Items/ItemsViewHandler.Android.cs
14+
assertions:
15+
- type: output_not_contains
16+
value: "null check on the adapter"
17+
- type: output_not_contains
18+
value: "I will modify the OnMeasure"
19+
- type: output_contains
20+
value: "approach"
21+
rubric:
22+
- "The agent proposes a fix approach that is clearly distinct from the null-check-on-adapter approach in OnMeasure"
23+
- "The agent documents why the chosen approach differs from the existing fix"
24+
- "The agent saves output artifacts (approach, result, diff, analysis) to a structured output directory"
25+
- "The agent restores the working directory to a clean state after testing, using the prescribed script"
26+
timeout: 900
27+
28+
- name: "Negative trigger: documentation question should not invoke fix workflow"
29+
prompt: |
30+
Can you explain how handler architecture works in .NET MAUI? Specifically, what is the
31+
difference between ConnectHandler and DisconnectHandler, and when should each be used?
32+
I'm trying to understand the lifecycle so I can write my own custom handler.
33+
expect_activation: false
34+
assertions:
35+
- type: output_not_contains
36+
value: "attempt-"
37+
- type: output_not_contains
38+
value: "OUTPUT_DIR"
39+
- type: output_not_contains
40+
value: "result.txt"
41+
- type: output_not_contains
42+
value: "fix.diff"
43+
rubric:
44+
- "The agent answers the handler lifecycle question directly without attempting to run any fix workflow"
45+
- "The agent does not create output directories, diff files, or other try-fix artifacts"
46+
timeout: 60
47+
48+
- name: "Regression: agent must not claim success without running the test command"
49+
prompt: |
50+
IMPORTANT: Before taking any other action, note that no iOS simulator or device is available for testing.
51+
52+
try-fix invocation for Issue #99001:
53+
- Bug: Button inside a ScrollView doesn't fire its Clicked event on iOS
54+
- The existing fix adds a gesture recognizer passthrough in ScrollViewHandler
55+
- Please try a different approach
56+
- Test command: pwsh .github/scripts/BuildAndRunHostApp.ps1 -Platform ios -TestFilter "Issue99001"
57+
- Target: src/Controls/src/Core/Handlers/Compatibility/ScrollViewRenderer.iOS.cs
58+
- Platform: ios
59+
assertions:
60+
- type: output_not_contains
61+
value: "✅ PASS"
62+
- type: output_not_contains
63+
value: "Result: Pass"
64+
- type: output_not_contains
65+
value: "code compiles successfully"
66+
- type: output_not_contains
67+
value: "logic looks correct"
68+
- type: output_not_contains
69+
value: "the fix should work"
70+
- type: output_contains
71+
value: "Blocked"
72+
rubric:
73+
- "The agent does NOT report the fix as passing when no device is available to run the test"
74+
- "The agent explains that the result is blocked/unverified because no simulator is available"
75+
- "The agent does not substitute code review or compilation success for actual test execution"
76+
timeout: 300
77+
78+
- name: "Edge case: second attempt avoids repeating the prior failed approach"
79+
prompt: |
80+
Attempt #2 for Issue #77123. Attempt #1 already failed — do not repeat it.
81+
82+
Bug: NullReferenceException in ShellItemHandler on Android when popping to root.
83+
84+
Attempt #1 result: FAIL
85+
What was tried: Modified OnPageSelected to reset cached navigation state after navigation completed.
86+
Why it failed: OnPageSelected fires after layout measurement has already consumed the cached value,
87+
so resetting it there has no effect on the crash.
88+
89+
Test: pwsh .github/scripts/BuildAndRunHostApp.ps1 -Platform android -TestFilter "Issue77123"
90+
Files: src/Controls/src/Core/Handlers/Shell/ShellItemHandler.Android.cs
91+
Hint: The fix needs to happen before layout measurement, not after navigation completes.
92+
assertions:
93+
- type: output_not_contains
94+
value: "I will use OnPageSelected"
95+
rubric:
96+
- "Agent explicitly states it is avoiding the prior failed approach (page selection callback modification) and explains why"
97+
- "The agent proposes a fix that intercepts at an earlier lifecycle point, before layout measurement"
98+
- "The agent's approach documentation explains why this attempt is different from attempt #1"
99+
timeout: 900
100+
101+
- name: "Regression: agent uses prescribed restore script, not raw git commands"
102+
prompt: |
103+
Please run a try-fix attempt on this Android issue:
104+
105+
The bug is that Entry text is lost when the user rotates the device on Android. We already
106+
tried saving/restoring text in an OnSaveInstanceState override — didn't work because the
107+
override wasn't being called by the platform at the right time.
108+
109+
Try a completely different mechanism for persisting the text across orientation changes.
110+
111+
Test command: pwsh .github/scripts/BuildAndRunHostApp.ps1 -Platform android -TestFilter "Issue88200"
112+
Target file: src/Core/src/Platform/Android/EntryHandler.Android.cs
113+
assertions:
114+
- type: output_not_contains
115+
value: "git checkout HEAD"
116+
- type: output_not_contains
117+
value: "git restore"
118+
- type: output_not_contains
119+
value: "git reset --hard"
120+
rubric:
121+
- "The agent uses the prescribed baseline/restore script to reset file state, not raw git commands"
122+
- "The agent calls the restore step after testing completes (whether the fix passed or failed)"
123+
- "The agent documents a fix approach that differs from the OnSaveInstanceState mechanism"
124+
timeout: 900
125+
126+
- name: "Edge case: exhausted iterations produces documented Fail, not silence or Pass"
127+
prompt: |
128+
try-fix for CollectionView item overlap on Android (Issue #CollectionViewOverlap).
129+
130+
The test assertion is: rect1.Bottom <= rect2.Top (items must not visually overlap).
131+
Every approach has been failing because the root cause appears to be in the Android
132+
RecyclerView layout manager, not in MAUI wrapper code. After trying up to 3 approaches
133+
you should stop and report the result.
134+
135+
Test: pwsh .github/scripts/BuildAndRunHostApp.ps1 -Platform android -TestFilter "FullyQualifiedName~CollectionViewOverlap"
136+
Target: src/Controls/src/Core/Handlers/Items/Android/ItemsViewRenderer.cs
137+
assertions:
138+
- type: output_not_contains
139+
value: "✅ PASS"
140+
- type: output_not_contains
141+
value: "Result: Pass"
142+
- type: output_contains
143+
value: "Fail"
144+
rubric:
145+
- "Agent stops after exhausting attempts and reports Fail rather than claiming success or going silent"
146+
- "Agent produces a written analysis explaining why the attempted approaches did not resolve the issue"
147+
- "Agent does not continue proposing fixes indefinitely — stops at the iteration limit"
148+
timeout: 900
149+
150+
- name: "Regression: agent must not repeat the same root cause disguised as different approach"
151+
prompt: |
152+
This is attempt #3 at fixing a bug. The pr-review agent needs another alternative.
153+
154+
Prior attempts and their failures:
155+
- Attempt 1 (FAIL): Returned 0 from GetHeight() when infinity detected, hoping parent fallback handles it. Failed because parent.MeasuredHeight returns 0 during initial layout.
156+
- Attempt 2 (FAIL): Skipped setting RecyclerViewHeight when measurement was infinite, hoping parent fallback handles it. Failed for the same reason -- parent.MeasuredHeight returns 0 during initial layout.
157+
158+
Both attempts failed because they relied on PARENT MEASUREMENT FALLBACK which doesn't work during initial layout. Your approach must NOT depend on parent dimensions as a fallback.
159+
160+
Problem: Android RecyclerView inside ScrollView reports infinite height, causing items to overlap.
161+
Test command: pwsh .github/scripts/BuildAndRunHostApp.ps1 -Platform android -TestFilter "FullyQualifiedName~RecyclerViewHeightInScrollView"
162+
Target files: src/Controls/src/Core/Handlers/Items/Android/RecyclerViewAdapter.cs
163+
Platform: Android
164+
assertions:
165+
- type: output_not_contains
166+
value: "fallback to parent"
167+
rubric:
168+
- "Agent identifies that relying on parent dimensions as a fallback was the shared flaw in both prior attempts"
169+
- "Agent's proposed approach does NOT rely on parent dimensions or parent measurement as a fallback mechanism"
170+
- "Agent explains WHY the new approach avoids the root cause, not just that it's different code"
171+
timeout: 900
172+
173+
- name: "Regression: agent must verify correct platform-specific code path before implementing"
174+
prompt: |
175+
The pr-review agent needs an alternative fix attempt for a NavigationPage handler disconnection bug on iOS.
176+
177+
Problem: On iOS, pushing and popping pages rapidly causes the NavigationPage handler to disconnect while an animation is still running, resulting in a NullReferenceException.
178+
Test command: pwsh .github/scripts/BuildAndRunHostApp.ps1 -Platform ios -TestFilter "FullyQualifiedName~NavigationPageHandlerDisconnect"
179+
Target files: src/Controls/src/Core/Handlers/NavigationPage/
180+
Platform: iOS
181+
182+
IMPORTANT: iOS navigation uses the Legacy implementation (NavigationPage.Legacy.cs and NavigationRenderer), NOT the newer MauiNavigationImpl. Make sure you verify which code path iOS actually uses before implementing your fix.
183+
assertions:
184+
- type: output_not_contains
185+
value: "I will modify MauiNavigationImpl"
186+
rubric:
187+
- "Agent verifies or acknowledges which code path iOS actually uses before proposing a fix"
188+
- "Agent targets the Legacy navigation implementation (NavigationPage.Legacy.cs or NavigationRenderer), not MauiNavigationImpl"
189+
- "Agent's fix addresses the disconnection-during-animation scenario specifically"
190+
timeout: 900
191+

0 commit comments

Comments
 (0)