ZviBaratz
diff --git a/‎.gitignore‎
Lines changed: 2 additions & 0 deletions b/‎.gitignore‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎CLAUDE.md‎
Lines changed: 55 additions & 2 deletions b/‎CLAUDE.md‎
Lines changed: 55 additions & 2 deletions
diff --git a/‎field-tests/.gitignore‎
Lines changed: 2 additions & 0 deletions b/‎field-tests/.gitignore‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎field-tests/annotations/appindicator.yaml‎
Lines changed: 98 additions & 0 deletions b/‎field-tests/annotations/appindicator.yaml‎
Lines changed: 98 additions & 0 deletions
diff --git a/‎field-tests/annotations/blur-my-shell.yaml‎
Lines changed: 84 additions & 0 deletions b/‎field-tests/annotations/blur-my-shell.yaml‎
Lines changed: 84 additions & 0 deletions
diff --git a/‎field-tests/annotations/clipboard-indicator.yaml‎
Lines changed: 85 additions & 0 deletions b/‎field-tests/annotations/clipboard-indicator.yaml‎
Lines changed: 85 additions & 0 deletions
@@ -6,3 +6,5 @@ node_modules/
 .claude/
 tests/fixtures/*/node_modules/
 tests/assertions/local-regression.sh
+field-tests/cache/
+field-tests/results/
@@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 
 ## Project Overview
 
-Claude Code plugin for GNOME Shell extension EGO (extensions.gnome.org) review compliance. It provides five skills (`ego-lint`, `ego-review`, `ego-scaffold`, `ego-simulate`, `ego-submit`). This is **not** a GNOME extension itself — it's a set of tools that validate GNOME extensions against EGO submission requirements. Load it with `claude --plugin-dir <path-to-this-repo>`.
+Claude Code plugin for GNOME Shell extension EGO (extensions.gnome.org) review compliance. It provides six skills (`ego-lint`, `ego-review`, `ego-scaffold`, `ego-simulate`, `ego-submit`, `ego-field-test`). This is **not** a GNOME extension itself — it's a set of tools that validate GNOME extensions against EGO submission requirements. Load it with `claude --plugin-dir <path-to-this-repo>`.
 
 ## Running ego-lint
 
@@ -77,7 +77,7 @@ Additional tooling:
 ### Three-tier rule system
 
 - **Tier 1 (patterns.yaml)**: 124 regex rules in YAML, processed by `apply-patterns.py`. Covers web APIs, deprecated APIs, security (telemetry, curl/gsettings spawn, base64), logging, import segregation, AI slop signals, subprocess safety, i18n, GSettings bind flags, GNOME 44-50 migration, code quality advisories. Add new rules by editing `rules/patterns.yaml`. Advanced fields: `min-version`/`max-version` (version-gating), `guard-pattern` + `guard-window` (line-level suppression with configurable lookback) + `guard-window-forward` (forward peeking), `replacement-pattern` (file-level suppression), `exclude-dirs`, `skip-comments` (comment-aware matching).
-- **Tier 2 (scripts)**: 17 structural heuristic check scripts in Python/bash. `check-quality.py` (AI slop heuristics), `check-init.py` (init-time safety), `check-lifecycle.py` (enable/disable symmetry + timeout verification), `check-resources.py` + `build-resource-graph.py` (cross-file resource tracking), `check-disclosures.py` (clipboard/network disclosure), `check-polkit.py` (polkit policy validation), `check-schema-usage.py` (unused/undefined schema keys), `check-accessibility.py` (a11y checks), plus metadata, prefs, GObject, async, CSS, imports, schema, and package checks. `ego-lint.sh` also has an inline minified JS check, code metrics, and a provenance-gated post-filter that suppresses R-SLOP-01/02 JSDoc warnings when `quality/code-provenance` score >= 4.
+- **Tier 2 (scripts)**: 17 structural heuristic check scripts in Python/bash. `check-quality.py` (AI slop heuristics), `check-init.py` (init-time safety), `check-lifecycle.py` (enable/disable symmetry + timeout verification), `check-resources.py` + `build-resource-graph.py` (cross-file resource tracking), `check-disclosures.py` (clipboard/network disclosure), `check-polkit.py` (polkit policy validation), `check-schema-usage.py` (unused/undefined schema keys), `check-accessibility.py` (a11y checks), plus metadata, prefs, GObject, async, CSS, imports, schema, and package checks. `ego-lint.sh` also has an inline minified JS check, code metrics, and a provenance-gated post-filter that suppresses R-SLOP-01/02 JSDoc warnings when `quality/code-provenance` score >= 3.
 - **Tier 3 (checklists)**: 6 semantic review checklists in `skills/ego-review/references/`: lifecycle, security, code-quality (with 10 additional quality items), ai-slop (46-item scoring model), licensing, accessibility (7 items). Applied by Claude during `ego-review` phases.
 
 ### ego-review internals
@@ -154,6 +154,59 @@ test(ego-lint): add fixture for deprecated ByteArray usage
 - **PR closes issue**: Include `Closes #N` in the PR description to auto-close the issue on merge
 - **Tests before PR**: Run `bash tests/run-tests.sh` and verify all assertions pass before pushing
 
+## Field Testing
+
+Batch ego-lint runner for regression testing across 10 real-world GNOME extensions.
+
+### Running field tests
+
+```bash
+bash scripts/field-test-runner.sh                    # lint all extensions
+bash scripts/field-test-runner.sh --extension NAME   # lint single extension
+bash scripts/field-test-runner.sh --update-baselines # save current as golden
+bash scripts/field-test-runner.sh --no-fetch         # skip git clones, use cache
+bash scripts/field-test-runner.sh --review --no-fetch          # lint + review all
+bash scripts/field-test-runner.sh --review --review-exclude X  # review all except X
+bash scripts/field-test-runner.sh --review-dry-run             # print prompts only
+```
+
+### Pipeline structure
+
+- `field-tests/manifest.yaml` — Extension source manifest (local paths, GitHub repos)
+- `field-tests/baselines/` — Golden JSON snapshots (committed)
+- `field-tests/annotations/` — Per-extension finding classifications: tp, fp, borderline, expected (committed)
+- `field-tests/history.jsonl` — Append-only trend data (committed)
+- `field-tests/cache/` — Downloaded extensions (gitignored)
+- `field-tests/results/` — Timestamped run output (gitignored), includes `.review.md` reports
+- `field-tests/reports/` — Regression/synthesis reports (committed)
+- `scripts/field-test-runner.sh` — Bash orchestrator (lint + optional review phase)
+- `scripts/parse-manifest.py` — Manifest YAML → JSON (inline parser, no PyYAML)
+- `scripts/parse-lint-results.py` — ego-lint stdout → structured JSON
+- `scripts/diff-baselines.py` — Baseline comparison + annotation-aware filtering
+- `scripts/review-prompt.md` — Review prompt template (incremental Write strategy)
+- `scripts/hydrate-review-prompt.py` — Template hydration with lint/diff/annotation data
+- `skills/ego-field-test/SKILL.md` — Claude Code skill for full pipeline (classification, synthesis, issue creation)
+
+### Calibration cycle
+
+1. Make a code change (guard pattern, threshold tweak, new rule)
+2. Run `bash scripts/field-test-runner.sh --no-fetch` — see impact across all extensions
+3. Classify new unannotated findings in `field-tests/annotations/`
+4. If FPs found, create issues and fix
+5. Run with `--update-baselines` to snapshot improved state
+
+### Review phase
+
+The `--review` flag runs headless `claude -p` sessions after lint. Each session uses `scripts/review-prompt.md` (hydrated with lint results, diff, and annotations). Key flags:
+
+- `--review` — review all extensions; `--review-changed` — only changed ones
+- `--review-exclude NAME` — skip specific extensions from review (repeatable); `--exclude` skips from both lint and review
+- `--budget AMOUNT` — max USD per review session (default: 4.00)
+- `--parallel N` — max concurrent sessions (default: 3)
+- `--review-dry-run` — write hydrated prompts without invoking claude
+
+Reports are written incrementally (section-by-section) to survive budget exhaustion. Review findings use `review/` prefix in annotation files to distinguish from lint findings.
+
 ## Releasing
 
 release-please automates versioning, CHANGELOG updates, git tags, and GitHub Releases:
 
@@ -0,0 +1,2 @@
+cache/
+results/
@@ -0,0 +1,98 @@
+# Classified findings for AppIndicator/KStatusNotifierItem Support
+# Source: docs/internal/field-test-appindicator.md
+findings:
+  # FAILs — True Positives
+  - id: "R-DEPR-04::legacy imports.gi syntax"
+    classification: tp
+    notes: "3 instances in indicatorStatusIcon.js, interfaces.js, appIndicator.js — GNOME 45+ should use ESM"
+  - id: "R-VER44-01::Meta.later_add dead code"
+    classification: tp
+    notes: "promiseUtils.js — API removed in GNOME 44, extension targets 45+"
+  - id: "R-VER44-02::Meta.later_remove dead code"
+    classification: tp
+    notes: "promiseUtils.js — same dead code issue"
+  - id: "metadata/future-shell-version::GNOME 50"
+    classification: tp
+    notes: "shell-version includes 50 which is newer than current stable"
+  - id: "no-deprecated-modules::imports.byteArray"
+    classification: tp
+    notes: "interfaces.js uses deprecated imports.byteArray"
+  - id: "non-gjs-scripts::ksni.py"
+    classification: borderline
+    notes: "indicator-test-tool/ksni.py is a developer test tool, not part of extension proper"
+  - id: "metadata/shell-version-range::6 versions"
+    classification: tp
+    notes: "45-50 exceeds max 4 allowed"
+  # FAILs — False Positives
+  - id: "R-SLOP-16::GLib.file_get_contents"
+    classification: fp
+    notes: "Rule claims API doesn't exist in GJS, but it does — valid GI binding for g_file_get_contents()"
+  - id: "R-VER46-01::add_actor runtime-guarded"
+    classification: fp
+    notes: "Code has if (obj.add_actor) guard — runtime feature detection. Fixed: guard-pattern."
+  - id: "R-VER46-02::remove_actor runtime-guarded"
+    classification: fp
+    notes: "Same runtime guard pattern as add_actor. Fixed: guard-pattern."
+  - id: "init/shell-modification::non-Extension constructors"
+    classification: fp
+    notes: "3 FPs — GLib.Error, Gio.Cancellable in constructors of runtime-only classes. Fixed: scoped to extension.js."
+  # WARNs — True Positives
+  - id: "R-SEC-06::run_dispose"
+    classification: tp
+    notes: "statusNotifierWatcher.js — needs justification"
+  - id: "R-LOG-03::print/printerr in dev tool"
+    classification: tp
+    notes: "11 instances in indicator-test-tool/"
+  - id: "R-QUAL-26::custom Logger class"
+    classification: tp
+    notes: "Logger wraps GLib.log_structured; console.debug preferred"
+  - id: "R-QUAL-33::Gio._promisify module-scope"
+    classification: tp
+    notes: "4 files — correct advisory"
+  - id: "quality/module-state::mutable module-level let"
+    classification: tp
+    notes: "settingsManager.js — intentional singleton but valid concern"
+  - id: "quality/mock-in-production::test files"
+    classification: tp
+    notes: "indicator-test-tool/testTool.js shouldn't ship"
+  - id: "gobject/missing-gtypename::collision risk"
+    classification: tp
+    notes: "5 instances"
+  - id: "async/missing-cancellable::async without cancellable"
+    classification: tp
+    notes: "dbusMenu.js, appIndicator.js"
+  - id: "disclosure/private-api::undisclosed"
+    classification: tp
+    notes: "Main.layoutManager access not disclosed"
+  - id: "disclosure/file-io::undisclosed"
+    classification: tp
+    notes: "File I/O not disclosed in metadata"
+  # WARNs — False Positives
+  - id: "R-SLOP-13::this instanceof in factory"
+    classification: fp
+    notes: "3 FPs — methods in MenuItemFactory bound to different shellItem types via connectSmart. Fixed: guard-pattern."
+  - id: "R-SLOP-35::Object.freeze enum"
+    classification: fp
+    notes: "3 FPs — standard JS enum pattern (SNICategory, SNIStatus, SNIconType). Fixed: guard-pattern."
+  - id: "R-SLOP-38::domain-specific identifiers"
+    classification: fp
+    notes: "4 FPs — brightnessContrastEffect and similar are standard Clutter API names. Fixed: threshold raised."
+  - id: "R-QUAL-31::_onDestroy signal handler"
+    classification: fp
+    notes: "7 FPs — _onDestroy is PanelMenu.Button signal handler convention. Fixed: guard-pattern."
+  - id: "lifecycle/connectObject-migration::connectSmart equivalent"
+    classification: fp
+    notes: "6 FPs — connectSmart provides equivalent auto-cleanup. Fixed: recognized in check-lifecycle.py."
+  - id: "lifecycle/signal-balance::connectSmart not counted"
+    classification: fp
+    notes: "66 connects vs 18 disconnects — doesn't account for connectSmart auto-disconnect. Fixed."
+  # WARNs — Mixed
+  - id: "quality/constructor-resources::runtime-only constructors"
+    classification: borderline
+    notes: "8 hits — extension.js:36 is TP (Extension ctor), others are FP (runtime-only, cleaned via destroy/connectSmart)"
+  - id: "lifecycle/untracked-timeout::GSource-based promise"
+    classification: borderline
+    notes: "promiseUtils.js is FP (GSource-based promise with _cleanup); indicator-test-tool entries are TP but irrelevant"
+  - id: "quality/redundant-cleanup::verbose destroy guards"
+    classification: borderline
+    notes: "4 instances — if (x) x.destroy() vs x?.destroy() is style preference"
@@ -0,0 +1,84 @@
+# Classified findings for Blur my Shell
+# Source: docs/internal/field-test-blur-my-shell.md
+findings:
+  # FAILs — Fixed False Positives
+  - id: "init/shell-modification::GObject.registerClass at module scope"
+    classification: fp
+    notes: "13 FPs — GObject.registerClass returns a class constructor, not an instance. Fixed: exemption in check-init.py."
+  - id: "file-structure/extension.js::src/ layout"
+    classification: fp
+    notes: "Extension uses src/ subdirectory layout. Fixed: src/ fallback in ego-lint.sh."
+  # FAILs — True Positives
+  - id: "R-DEPR-06::Tweener usage"
+    classification: tp
+    notes: "appfolders.js uses imports.tweener.tweener — would crash on GNOME 46+. 4 line hits."
+  - id: "R-VER47-01::Clutter.Color"
+    classification: borderline
+    notes: "appfolders.js has ternary runtime guard (Clutter.Color ? ... : Cogl.Color), but file crashes before reaching this code due to Tweener import"
+  # WARNs — False Positives
+  - id: "resource-tracking/destroy-not-called::disable() not recognized"
+    classification: fp
+    notes: "63 hits — components use .disable() not .destroy() as cleanup. Resource graph only recognizes destroy()."
+  - id: "quality/constructor-resources::pipeline instances"
+    classification: fp
+    notes: "17 hits — mostly FP, pipeline instances managed via parent lifecycle"
+  - id: "resource-tracking/no-destroy-method::utility classes"
+    classification: fp
+    notes: "10 hits — utility classes use disconnect_all(), remove() instead"
+  - id: "R-SLOP-38::descriptive identifier"
+    classification: fp
+    notes: "dash_not_already_destroyed is descriptive, not AI verbosity. Fixed: guard-pattern."
+  - id: "R-SLOP-24::non-extension schema"
+    classification: fp
+    notes: "new Gio.Settings({schema: 'org.gnome.mutter'}) correctly accesses system schema. Fixed: guard-pattern."
+  # WARNs — True Positives
+  - id: "lifecycle/prototype-override::UnlockDialog overrides"
+    classification: tp
+    notes: "6 instances — correctly flags lockscreen UnlockDialog overrides"
+  - id: "R-I18N-01::template literals in _()"
+    classification: tp
+    notes: "4 instances — breaks xgettext extraction"
+  - id: "R-SLOP-16::GLib.file_get_contents synchronous"
+    classification: tp
+    notes: "Synchronous file read advisory"
+  - id: "R-SLOP-03::version field deprecated"
+    classification: tp
+    notes: "Deprecated for GNOME 45+"
+  - id: "R-SEC-09::Main.extensionManager access"
+    classification: tp
+    notes: "Extension system interference for Dash to Panel compat"
+  - id: "R-DEPR-09::var usage"
+    classification: tp
+    notes: "var x, y; should use let"
+  - id: "quality/private-api::Main.overview._overview"
+    classification: tp
+    notes: "Private API access — correct advisory"
+  - id: "quality/module-state::module vars not reset"
+    classification: tp
+    notes: "sigma and brightness not reset in disable"
+  - id: "quality/empty-catch::empty catch block"
+    classification: tp
+    notes: "paint_signals.js — empty catch"
+  - id: "lifecycle/signal-balance::125 vs 28"
+    classification: tp
+    notes: "By design — Connections class auto-cleans, but signal-balance heuristic can't verify"
+  - id: "lifecycle/async-destroyed-guard::await import"
+    classification: tp
+    notes: "Low risk — await import() in utils.js without guard"
+  # WARNs — Mixed
+  - id: "lifecycle/untracked-timeout::prefs auto-cleanup"
+    classification: borderline
+    notes: "4 hits — 2 TP, 2 FP (prefs.js timeouts auto-cleanup on window close)"
+  # ego-review advisory findings (not caught by ego-lint)
+  - id: "lifecycle::actor.destroy missing parentheses"
+    classification: tp
+    notes: "L-3: coverflow_alt_tab.js:69 — actor.destroy is property access, not function call. Undetectable by pattern matching."
+  - id: "lifecycle::splice wrong argument type"
+    classification: tp
+    notes: "L-10: window_list.js:111 — passes object instead of index to splice()"
+  - id: "lifecycle::setTimeout source ID not stored"
+    classification: tp
+    notes: "L-1: panel.js:70 — can fire after disable"
+  - id: "lifecycle::GLib.idle_add source ID not stored"
+    classification: tp
+    notes: "L-2: panel.js:91 — can fire after disable"
@@ -0,0 +1,85 @@
+# Classified findings for Clipboard Indicator
+# Source: docs/internal/field-test-clipboard-indicator.md
+findings:
+  # FAILs — Fixed False Positives
+  - id: "R-WEB-01::setTimeout"
+    classification: fp
+    notes: "GJS added native setTimeout in GNOME 45. Rule was unconditionally blocking. Fixed: max-version 44."
+  - id: "R-WEB-02::setInterval"
+    classification: fp
+    notes: "Same as R-WEB-01. Fixed: max-version 44."
+  - id: "R-WEB-10::clearTimeout"
+    classification: fp
+    notes: "Same as R-WEB-01. Fixed: max-version 44."
+  - id: "R-WEB-11::clearInterval"
+    classification: fp
+    notes: "Same as R-WEB-01. Fixed: max-version 44."
+  - id: "license::LICENSE.rst not recognized"
+    classification: fp
+    notes: "License check only recognized LICENSE/COPYING, not .rst/.md/.txt variants. Fixed."
+  - id: "metadata/uuid-matches-dir::cloned repo"
+    classification: fp
+    notes: "FAIL for cloned repos where directory != UUID. Fixed: downgraded to WARN."
+  # FAILs — True Positives
+  - id: "css/shell-class-override::.popup-menu-item"
+    classification: tp
+    notes: "Overrides Shell theme class without scoping — genuine issue"
+  - id: "R-DEPR-11::Shell.KeyBindingMode"
+    classification: tp
+    notes: "Dead code, removed before GNOME 40"
+  # WARNs — True Positives
+  - id: "css/important::!important usage"
+    classification: tp
+    notes: "Correct advisory"
+  - id: "R-DEPR-09::var declarations"
+    classification: tp
+    notes: "3 instances — should be const/let"
+  - id: "R-SEC-06::run_dispose"
+    classification: tp
+    notes: "run_dispose() on virtual keyboard device"
+  - id: "R-PREFS-04c::GTK layout widget"
+    classification: tp
+    notes: "Correct advisory"
+  - id: "R-VER48-04b::vertical property deprecated"
+    classification: tp
+    notes: "3 instances — correct advisory"
+  - id: "quality/module-state::module-level let"
+    classification: tp
+    notes: "5 module-level let variables — real enable/disable lifecycle concern"
+  - id: "quality/constructor-resources::connect in prefs"
+    classification: tp
+    notes: "2 instances — .connect() in prefs constructor"
+  - id: "quality/private-api::Main.panel"
+    classification: tp
+    notes: "Private API access — correct advisory"
+  - id: "lifecycle/signal-balance::17 vs 6"
+    classification: tp
+    notes: "Genuine signal balance concern"
+  - id: "lifecycle/async-destroyed-guard::no _destroyed guard"
+    classification: tp
+    notes: "No destroyed guard on async code"
+  - id: "lifecycle/clipboard-keybinding::security pattern"
+    classification: tp
+    notes: "Clipboard + keybinding pattern detected"
+  - id: "gobject/missing-gtypename::ConfirmDialog"
+    classification: tp
+    notes: "Missing GTypeName — collision risk"
+  - id: "async/no-cancellable::Gio async without cancellable"
+    classification: tp
+    notes: "Correct finding"
+  - id: "async/missing-cancellable::_async without cancellable"
+    classification: tp
+    notes: "Correct finding"
+  - id: "resource-tracking/no-destroy-method::registry.js"
+    classification: tp
+    notes: "Registry has no destroy() method"
+  - id: "resource-tracking/ownership::orphan detected"
+    classification: tp
+    notes: "1 orphan detected — correct"
+  # ego-review blocking issues
+  - id: "lifecycle::_historyLabel on global.stage never removed"
+    classification: tp
+    notes: "B1: Actor leak on every enable/disable cycle — ego-review finding"
+  - id: "lifecycle::_notifSource never destroyed in disable()"
+    classification: tp
+    notes: "B2: Notification source persists — ego-review finding"