Update issue-repro and issue-fix skills for performance investigations#3549
Open
mattleibow wants to merge 20 commits intomainfrom
Open
Update issue-repro and issue-fix skills for performance investigations#3549mattleibow wants to merge 20 commits intomainfrom
mattleibow wants to merge 20 commits intomainfrom
Conversation
da0317b to
4410adb
Compare
…3548) * Make add-comment risk dynamic based on content and confidence Previously, all add-comment actions were hardcoded as 'high' risk regardless of content. This change makes the risk level dynamic: - low: factual content (repro findings, version matrix) with confidence >= 0.85 - medium: workaround/suggestion content with confidence >= 0.85 - high: close/reject content, or confidence < 0.70 Updated in both triage and repro schema cheatsheets and schema definitions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Make skill action schemas automation-ready with required data fields - Add stateReason, category, milestone fields to both schemas - Remove update-project (not automatable) - Add Action Data Requirements section to repro cheatsheet - Document required fields for every action type in both cheatsheets - Prevents hollow actions (correct type but missing data) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add macOS platform support and performance bug handling to issue-repro skill: - New platform-macos.md reference (GUI templates, Metal/GL, macOS gotchas) - New Category 10: Performance in bug-categories.md - Route macOS and perf signals in SKILL.md Bring issue-fix skill to parity with triage/repro: - New schema-cheatsheet.md, anti-patterns.md, response-guidelines.md - New perf-investigation.md (phase profiling, isolation experiments, AI consultation) - New macos-diagnostics.md (GL stencil trap, VSync control, Skia source locations) - Expanded fix-examples.md with perf bug and needs-info examples - Updated SKILL.md: mandatory reads, expanded Phase 5/7/8 Motivated by #3525 investigation that found SKGLView stencil bug. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Refactor issue-triage: replace MANDATORY/NEVER/MUST blocks with reasoning-based guidance, consolidate triple pre-flight loading, add error recovery table, rewrite description - Polish issue-fix: soften remaining MUST/CRITICAL spots, rewrite description from keyword list to intent-focused - Remove macos-diagnostics.md reference file, inline key details into SKILL.md and perf-investigation.md Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
4410adb to
1cd1a6c
Compare
Genericize examples that referenced specific issue numbers: - issue-fix SKILL.md: generalize #3369 workaround-vs-root-cause example, replace #3369/#3272 in PR template with #NNNN/#MMMM placeholders - issue-fix anti-patterns.md: remove #3525 from VSync rule - issue-triage skia-patterns.md: remove #3386 Issue Lessons table - issue-repro SKILL.md: genericize #3430 simulation example - issue-repro conclusion-guide.md: genericize #3279 breaking-change example Full worked JSON examples in *-examples.md files retained — those use issue numbers as structural anchors for complete schema examples. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace macOS-specific §5.2a/§5.2b with a single platform-generic performance debugging section. Key techniques (correct view type, baselines, per-phase profiling, VSync, multi-backend comparison) now apply to any platform. Full methodology still in perf-investigation.md. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove 'Application activation is critical' — sample app works without it; was a workaround for missing NSApplication.Main() - Remove verbose AppDelegate boilerplate (15 lines) — replaced with brief 'add view to window' snippet - Remove Pitfalls section — GL 'driver quirks' and 'abandoned GL' framing was misleading; GL is deprecated but still works - Remove glGetIntegerv stencil claim — too specific to one issue - Remove 'using SkiaSharp.Views.Mac' from templates (implicit) - Keep: run binary directly (true), dotnet build not run (true), backend templates (useful for rendering bugs), code signing tip - Rename 'Gotchas' to 'Tips', reframe backend templates as optional 195 → 160 lines Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace real issue-specific examples (#2997, #3422, #3525, #3540) with fictional but realistic scenarios that teach JSON structure and patterns without being tied to specific past bugs. Changes across all 3 skills: - fix-examples.md: 4 examples (fixed/cannot-fix/perf-fix/needs-info) - Platforms: Windows x64, cross-platform, Android arm64, Linux x64 - Scenarios: text measurement, animated WebP, GL flush, encode crash - triage-examples.md: 3 examples (bug/question/duplicate) - Platforms: iOS, Android - Scenarios: decode crash, font loading, duplicate detection - repro-examples.md: 3 examples (reproduced/cross-platform/confirmed) - Platforms: Windows x64, Linux Docker, Android arm64 - Scenarios: path ops, CJK rendering, touch pressure All 10 JSON examples validated against their respective schemas. Fixed schema violations in fix examples (area/method enum values, pr properties) that existed in originals but were never caught. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- SKILL.md Rules 7-9: Add explicit 'measure yourself', 'stop and go back to their code', and 'do NOT create simpler test with different mode' - bug-categories.md: Add ⛔ CPU raster prohibition, 🛑 STOP gate after Steps 1-2, strengthen native baseline instructions, require GPU view for GPU perf bugs Eval results: baseline 44% → iter1 69% → iter2 88% overall Perf eval: 14% → 43% → 71% Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- category-bugs.md: 7 bug types (API, native, rendering, platform, build, memory, parity gap) with Goal sections and standardized headings - category-features.md: Enhancement and documentation categories with confirmed/not-confirmed conclusion guidance - category-performance.md: Performance-specific workflow with mode-matching rule (GPU→GPU, CPU→CPU), timeout/retry guidance, and emphasis on measuring both sides of the comparison yourself Key changes: - Every category now has a Goal section explaining what the minimal repro is - Standardized heading order: Signals → Goal → Strategy → Example → Pitfalls → Only bail - Performance: CPU raster is valid for CPU bugs, invalid for GPU bugs (was previously 'never use CPU raster' which is wrong for CPU perf issues) - Performance: Timeouts are not blockers — retry network ops, wait for builds - SKILL.md classification now routes to the matching category file - Console fallback has GPU performance exception Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…structions Root cause of standalone GPU repro being skipped: category-performance.md had its own Step 1-6 numbering that competed with SKILL.md Phase 3A/3B/3C. Agent followed SKILL.md's version-based ordering and treated category steps as advisory. Fix: - Renamed steps to Phase 3A/3B/3C/3D matching SKILL.md naming - Phase 3A = measure both sides of reporter's comparison - Phase 3B = test latest version (standard) - Phase 3C = YOUR standalone GPU repro (MANDATORY, with concrete example) - Phase 3D = instrument your repro with per-phase timing - SKILL.md now says perf bugs REPLACE standard Phase 3 flow - Added concrete `dotnet new macos` + SKGLView example for GPU repro - Explicitly references platform-macos.md templates Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Phase 3 was version-centric (3A=reporter version, 3B=latest, 3C=main) which buried the 'create YOUR minimal repro' step. The agent would run reporter's code across versions but never create its own independent reproduction. New Phase 3 (reproduction-centric): - 3A: Run baseline (if reporter provides native C++, working version, etc.) - 3B: Run reporter's broken code (verify their claim) - 3C: Create YOUR minimal repro (strip framework, use platform file) - 3D: Test your repro across versions (reporter's → latest → main) - 3E: Cross-platform verification (conditional) Key insight: baseline isn't perf-specific — any bug can have a baseline (native C++ that renders correctly, older version that works, etc.) And 'create your own repro' is universal — the SVG agent did it naturally but it was never an explicit step. Now it is. category-performance.md slimmed from 152→96 lines: removed duplicate Phase 3 flow, kept only perf-specific additions (instrumentation, VSync, mode-matching). Now says 'follow SKILL.md Phase 3 with these additions' instead of 'replace the standard flow'. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
SKMetalView when the reporter's issue was about SKGLView (OpenGL). OpenGL and Metal are different Skia code paths — switching backends invalidates the comparison. Changes: - Rule 9: 'same rendering mode' → 'same rendering backend (GL→GL, Metal→Metal)' - category-performance.md: explicit backend matching, not just GPU vs CPU - SKILL.md Phase 3C: 'match reporter's backend' not 'SKGLView/SKMetalView' - platform-macos.md: reorder templates with match-reporter warning, GL first Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1. Created project manually instead of 'dotnet new macos' — missing Info.plist/bundle structure caused blank screen 2. Measured 120fps on empty canvas without validating anything rendered 3. Skipped native C++ baseline (Rule 7 violation, not cleverness) Fixes: - SKILL.md Phase 3C: mandatory rendering validation before measurements - platform-macos.md: 'dotnet new macos' is REQUIRED, not optional - platform-macos.md: Run & Verify now mandates visual validation - category-performance.md: validate rendering before measuring FPS - anti-patterns.md: new #10 — accepting measurements without validation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
dotnet new macos ensures proper app structure (no hacks to launch). Visual validation is always required regardless of how the project was created — they are independent concerns. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The agent keeps choosing Metal even when the reporter uses SKGLView (OpenGL). Root cause: the templates were listed by backend name and the agent picks one. Restructured platform-macos.md to: - Lead with a lookup table: reporter uses X → you use X - Templates are reference implementations, not suggestions - Removed 'test both backends' as primary guidance; now it's 'reproduce on reporter's backend, optionally test the other' Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Agents keep using the reporter's claimed '120fps' instead of building
and running the native C++ baseline. The existing language ('build and
run it yourself') was too soft. Now uses 🛑 stop gate with explicit
statement that claimed numbers are a Rule 7 violation.
Added concrete anti-pattern example: reporter says 120fps, you skip
building and use that number — that's not measuring.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Update AI skills for performance investigations
Motivated by the #3525 investigation (macOS GL stencil bug, PR #3546) which revealed that both skills had no support for performance bug reproduction or investigation.
Knowledge separation: General → Performance → macOS
Repro skill changes (3 files)
platform-macos.md— macOS GUI app reproduction templates (Metal/GL backends, NSApplication gotchas, run binary directly vsopen -n, Phase 3C guidance)bug-categories.md— Performance bugs: native baselines, per-phase Stopwatch instrumentation, multi-complexity testing, VSync control, statistical stabilityFix skill changes — General parity (4 new + 2 updated files)
schema-cheatsheet.md— quick-reference for fix JSON fields (triage/repro had this, fix didn't)anti-patterns.md— 12 numbered rules including [BUG] SkiaSharp rendering performance is much lower than native Skia C++ #3525 learnings (triage has 9, fix had 2 inline)response-guidelines.md— PR description template, issue comment structure, perf results tablesfix-examples.md— added Example 3 (perf fix [BUG] SkiaSharp rendering performance is much lower than native Skia C++ #3525) and Example 4 (needs-info)SKILL.md— mandatory reads section, expanded Phase 5 (perf + macOS routing), Phase 7 (perf verification), Phase 8 (response guidelines), anti-patterns referenceFix skill changes — Performance + macOS (2 new files)
perf-investigation.md— full methodology: phase profiling, isolation experiments (one variable at a time), native baselines, AI model consultation protocol (3 models, 2/3 consensus), debugging tables, statistical requirementsmacos-diagnostics.md— GL stencil trap (the [BUG] SkiaSharp rendering performance is much lower than native Skia C++ #3525 root cause pattern), VSync P/Invoke, NSOpenGLContext setup, Metal vs GL switching, Skia source reading locationsWhat this enables
Next time someone reports "SkiaSharp is slow" or "macOS rendering has low FPS":