Skip to content

Update issue-repro and issue-fix skills for performance investigations#3549

Open
mattleibow wants to merge 20 commits intomainfrom
dev/skill-perf-investigation
Open

Update issue-repro and issue-fix skills for performance investigations#3549
mattleibow wants to merge 20 commits intomainfrom
dev/skill-perf-investigation

Conversation

@mattleibow
Copy link
Contributor

Update AI skills for performance investigations

Motivated by the #3525 investigation (macOS GL stencil bug, PR #3546) which revealed that both skills had no support for performance bug reproduction or investigation.

Knowledge separation: General → Performance → macOS

Layer Repro skill Fix skill
General schema-cheatsheet.md, anti-patterns.md, response-guidelines.md, expanded examples
Performance Category 10 in bug-categories.md perf-investigation.md
macOS platform-macos.md macos-diagnostics.md

Repro skill changes (3 files)

  • New platform-macos.md — macOS GUI app reproduction templates (Metal/GL backends, NSApplication gotchas, run binary directly vs open -n, Phase 3C guidance)
  • Category 10 in bug-categories.md — Performance bugs: native baselines, per-phase Stopwatch instrumentation, multi-complexity testing, VSync control, statistical stability
  • SKILL.md routing — macOS at priority 3, performance note in Phase 3

Fix skill changes — General parity (4 new + 2 updated files)

Fix skill changes — Performance + macOS (2 new files)

  • New perf-investigation.md — full methodology: phase profiling, isolation experiments (one variable at a time), native baselines, AI model consultation protocol (3 models, 2/3 consensus), debugging tables, statistical requirements
  • New macos-diagnostics.md — GL stencil trap (the [BUG] SkiaSharp rendering performance is much lower than native Skia C++ #3525 root cause pattern), VSync P/Invoke, NSOpenGLContext setup, Metal vs GL switching, Skia source reading locations

What this enables

Next time someone reports "SkiaSharp is slow" or "macOS rendering has low FPS":

  • Repro skill will create actual GUI app benchmarks (not console apps), measure per-phase timing, and produce quantitative evidence
  • Fix skill will follow a systematic profiling → isolation → AI consultation pipeline instead of guessing

mattleibow and others added 4 commits March 5, 2026 21:44
…3548)

* Make add-comment risk dynamic based on content and confidence

Previously, all add-comment actions were hardcoded as 'high' risk regardless
of content. This change makes the risk level dynamic:

- low: factual content (repro findings, version matrix) with confidence >= 0.85
- medium: workaround/suggestion content with confidence >= 0.85
- high: close/reject content, or confidence < 0.70

Updated in both triage and repro schema cheatsheets and schema definitions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Make skill action schemas automation-ready with required data fields

- Add stateReason, category, milestone fields to both schemas
- Remove update-project (not automatable)
- Add Action Data Requirements section to repro cheatsheet
- Document required fields for every action type in both cheatsheets
- Prevents hollow actions (correct type but missing data)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add macOS platform support and performance bug handling to issue-repro skill:
- New platform-macos.md reference (GUI templates, Metal/GL, macOS gotchas)
- New Category 10: Performance in bug-categories.md
- Route macOS and perf signals in SKILL.md

Bring issue-fix skill to parity with triage/repro:
- New schema-cheatsheet.md, anti-patterns.md, response-guidelines.md
- New perf-investigation.md (phase profiling, isolation experiments, AI consultation)
- New macos-diagnostics.md (GL stencil trap, VSync control, Skia source locations)
- Expanded fix-examples.md with perf bug and needs-info examples
- Updated SKILL.md: mandatory reads, expanded Phase 5/7/8

Motivated by #3525 investigation that found SKGLView stencil bug.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Refactor issue-triage: replace MANDATORY/NEVER/MUST blocks with
  reasoning-based guidance, consolidate triple pre-flight loading,
  add error recovery table, rewrite description
- Polish issue-fix: soften remaining MUST/CRITICAL spots, rewrite
  description from keyword list to intent-focused
- Remove macos-diagnostics.md reference file, inline key details
  into SKILL.md and perf-investigation.md

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mattleibow mattleibow force-pushed the dev/skill-perf-investigation branch from 4410adb to 1cd1a6c Compare March 5, 2026 19:44
mattleibow and others added 16 commits March 5, 2026 21:52
Genericize examples that referenced specific issue numbers:
- issue-fix SKILL.md: generalize #3369 workaround-vs-root-cause example,
  replace #3369/#3272 in PR template with #NNNN/#MMMM placeholders
- issue-fix anti-patterns.md: remove #3525 from VSync rule
- issue-triage skia-patterns.md: remove #3386 Issue Lessons table
- issue-repro SKILL.md: genericize #3430 simulation example
- issue-repro conclusion-guide.md: genericize #3279 breaking-change example

Full worked JSON examples in *-examples.md files retained — those
use issue numbers as structural anchors for complete schema examples.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace macOS-specific §5.2a/§5.2b with a single platform-generic
performance debugging section. Key techniques (correct view type,
baselines, per-phase profiling, VSync, multi-backend comparison)
now apply to any platform. Full methodology still in perf-investigation.md.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove 'Application activation is critical' — sample app works
  without it; was a workaround for missing NSApplication.Main()
- Remove verbose AppDelegate boilerplate (15 lines) — replaced with
  brief 'add view to window' snippet
- Remove Pitfalls section — GL 'driver quirks' and 'abandoned GL'
  framing was misleading; GL is deprecated but still works
- Remove glGetIntegerv stencil claim — too specific to one issue
- Remove 'using SkiaSharp.Views.Mac' from templates (implicit)
- Keep: run binary directly (true), dotnet build not run (true),
  backend templates (useful for rendering bugs), code signing tip
- Rename 'Gotchas' to 'Tips', reframe backend templates as optional

195 → 160 lines

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace real issue-specific examples (#2997, #3422, #3525, #3540) with
fictional but realistic scenarios that teach JSON structure and patterns
without being tied to specific past bugs.

Changes across all 3 skills:
- fix-examples.md: 4 examples (fixed/cannot-fix/perf-fix/needs-info)
  - Platforms: Windows x64, cross-platform, Android arm64, Linux x64
  - Scenarios: text measurement, animated WebP, GL flush, encode crash
- triage-examples.md: 3 examples (bug/question/duplicate)
  - Platforms: iOS, Android
  - Scenarios: decode crash, font loading, duplicate detection
- repro-examples.md: 3 examples (reproduced/cross-platform/confirmed)
  - Platforms: Windows x64, Linux Docker, Android arm64
  - Scenarios: path ops, CJK rendering, touch pressure

All 10 JSON examples validated against their respective schemas.
Fixed schema violations in fix examples (area/method enum values,
pr properties) that existed in originals but were never caught.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- SKILL.md Rules 7-9: Add explicit 'measure yourself', 'stop and go back
  to their code', and 'do NOT create simpler test with different mode'
- bug-categories.md: Add ⛔ CPU raster prohibition, 🛑 STOP gate after
  Steps 1-2, strengthen native baseline instructions, require GPU view
  for GPU perf bugs

Eval results: baseline 44% → iter1 69% → iter2 88% overall
Perf eval: 14% → 43% → 71%

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- category-bugs.md: 7 bug types (API, native, rendering, platform, build,
  memory, parity gap) with Goal sections and standardized headings
- category-features.md: Enhancement and documentation categories with
  confirmed/not-confirmed conclusion guidance
- category-performance.md: Performance-specific workflow with mode-matching
  rule (GPU→GPU, CPU→CPU), timeout/retry guidance, and emphasis on
  measuring both sides of the comparison yourself

Key changes:
- Every category now has a Goal section explaining what the minimal repro is
- Standardized heading order: Signals → Goal → Strategy → Example → Pitfalls → Only bail
- Performance: CPU raster is valid for CPU bugs, invalid for GPU bugs
  (was previously 'never use CPU raster' which is wrong for CPU perf issues)
- Performance: Timeouts are not blockers — retry network ops, wait for builds
- SKILL.md classification now routes to the matching category file
- Console fallback has GPU performance exception

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…structions

Root cause of standalone GPU repro being skipped: category-performance.md
had its own Step 1-6 numbering that competed with SKILL.md Phase 3A/3B/3C.
Agent followed SKILL.md's version-based ordering and treated category steps
as advisory.

Fix:
- Renamed steps to Phase 3A/3B/3C/3D matching SKILL.md naming
- Phase 3A = measure both sides of reporter's comparison
- Phase 3B = test latest version (standard)
- Phase 3C = YOUR standalone GPU repro (MANDATORY, with concrete example)
- Phase 3D = instrument your repro with per-phase timing
- SKILL.md now says perf bugs REPLACE standard Phase 3 flow
- Added concrete `dotnet new macos` + SKGLView example for GPU repro
- Explicitly references platform-macos.md templates

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Phase 3 was version-centric (3A=reporter version, 3B=latest, 3C=main) which
buried the 'create YOUR minimal repro' step. The agent would run reporter's
code across versions but never create its own independent reproduction.

New Phase 3 (reproduction-centric):
- 3A: Run baseline (if reporter provides native C++, working version, etc.)
- 3B: Run reporter's broken code (verify their claim)
- 3C: Create YOUR minimal repro (strip framework, use platform file)
- 3D: Test your repro across versions (reporter's → latest → main)
- 3E: Cross-platform verification (conditional)

Key insight: baseline isn't perf-specific — any bug can have a baseline
(native C++ that renders correctly, older version that works, etc.)
And 'create your own repro' is universal — the SVG agent did it naturally
but it was never an explicit step. Now it is.

category-performance.md slimmed from 152→96 lines: removed duplicate
Phase 3 flow, kept only perf-specific additions (instrumentation, VSync,
mode-matching). Now says 'follow SKILL.md Phase 3 with these additions'
instead of 'replace the standard flow'.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
SKMetalView when the reporter's issue was about SKGLView (OpenGL).
OpenGL and Metal are different Skia code paths — switching backends
invalidates the comparison.

Changes:
- Rule 9: 'same rendering mode' → 'same rendering backend (GL→GL, Metal→Metal)'
- category-performance.md: explicit backend matching, not just GPU vs CPU
- SKILL.md Phase 3C: 'match reporter's backend' not 'SKGLView/SKMetalView'
- platform-macos.md: reorder templates with match-reporter warning, GL first

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1. Created project manually instead of 'dotnet new macos' — missing
   Info.plist/bundle structure caused blank screen
2. Measured 120fps on empty canvas without validating anything rendered
3. Skipped native C++ baseline (Rule 7 violation, not cleverness)

Fixes:
- SKILL.md Phase 3C: mandatory rendering validation before measurements
- platform-macos.md: 'dotnet new macos' is REQUIRED, not optional
- platform-macos.md: Run & Verify now mandates visual validation
- category-performance.md: validate rendering before measuring FPS
- anti-patterns.md: new #10 — accepting measurements without validation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
dotnet new macos ensures proper app structure (no hacks to launch).
Visual validation is always required regardless of how the project
was created — they are independent concerns.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The agent keeps choosing Metal even when the reporter uses SKGLView
(OpenGL). Root cause: the templates were listed by backend name and
the agent picks one. Restructured platform-macos.md to:

- Lead with a lookup table: reporter uses X → you use X
- Templates are reference implementations, not suggestions
- Removed 'test both backends' as primary guidance; now it's
  'reproduce on reporter's backend, optionally test the other'

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Agents keep using the reporter's claimed '120fps' instead of building
and running the native C++ baseline. The existing language ('build and
run it yourself') was too soft. Now uses 🛑 stop gate with explicit
statement that claimed numbers are a Rule 7 violation.

Added concrete anti-pattern example: reporter says 120fps, you skip
building and use that number — that's not measuring.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[BUG] SkiaSharp rendering performance is much lower than native Skia C++

1 participant