docs: add field-tests/README.md with extension catalog and results (#67)

ZviBaratz · claude · web-flow · commit 22fc27c559f9 · 2026-03-07T21:08:56.000+02:00
## Summary - Add `field-tests/README.md` with a browsable overview of the 10 field test extensions: catalog with descriptions, code metrics, latest lint results (PASS/FAIL/WARN/SKIP), and annotation coverage (TP/FP/borderline/unannotated) - Update `ego-field-test` SKILL.md to include a new Step 6 that keeps the README tables current after each field test run ## Test plan - [x] Verify `field-tests/README.md` renders correctly on GitHub - [x] Verify lint results match latest `history.jsonl` entries (2026-03-07 run) - [x] Verify annotation counts match `annotations/*.yaml` files - [x] Run `bash tests/run-tests.sh` to confirm no regressions Closes #66 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
diff --git a/.github/workflows/lint-pr.yml b/.github/workflows/lint-pr.yml
@@ -35,6 +35,7 @@ jobs:
             ego-scaffold
             ego-simulate
             ego-submit
+            ego-field-test
           requireScope: false
           subjectPattern: ^[a-z].+$
           subjectPatternError: |
diff --git a/README.md b/README.md
@@ -184,7 +184,7 @@ Key unwritten rules discovered:
 7. Subprocesses must have cancellation path in `disable()`
 8. No `convenience.js` patterns
 
-Full research: [docs/research/](docs/research/) | Coverage gaps: [docs/research/gap-analysis.md](docs/research/gap-analysis.md) | Field testing: [docs/internal/](docs/internal/) — 7 real-world extensions tested with TP/FP classification and calibration lessons
+Full research: [docs/research/](docs/research/) | Coverage gaps: [docs/research/gap-analysis.md](docs/research/gap-analysis.md) | Field testing: [field-tests/](field-tests/) — 10 real-world extensions tested with TP/FP classification and calibration lessons
 
 ## 🗺️ Roadmap
 
diff --git a/field-tests/README.md b/field-tests/README.md
@@ -0,0 +1,108 @@
+# Field Tests
+
+Batch ego-lint runner for regression testing across 10 real-world GNOME Shell extensions. Used to calibrate rules, catch false positives, and track lint accuracy over time.
+
+## Extension Catalog
+
+| Extension | Description | EGO Approved |
+|---|---|---|
+| [hara-hachi-bu](https://github.com/ZviBaratz/hara-hachi-bu) | Power profile and battery charge limit control from Quick Settings (polkit, clipboard) | No |
+| [tiling-shell](https://github.com/domferr/tilingshell) | Advanced tiling window management with Snap Assistant and custom layouts (compiled TypeScript) | Yes |
+| [v-shell](https://github.com/G-dH/vertical-workspaces) | Customizable horizontal/vertical workspace layout and Shell UX tweaks | Yes |
+| [gsconnect](https://github.com/GSConnect/gnome-shell-extension-gsconnect) | KDE Connect implementation for GNOME — device sharing, SMS, remote control (D-Bus daemon) | Yes |
+| [appindicator](https://github.com/ubuntu/gnome-shell-extension-appindicator) | AppIndicator, KStatusNotifierItem, and legacy tray icon support | Yes |
+| [clipboard-indicator](https://github.com/Tudmotu/gnome-shell-extension-clipboard-indicator) | Clipboard manager with history | Yes |
+| [blur-my-shell](https://github.com/aunetx/blur-my-shell) | Blur effects for top panel, dash, and overview | Yes |
+| [dash-to-panel](https://github.com/home-sweet-gnome/dash-to-panel) | Icon taskbar combining dash and system tray into the main panel | Yes |
+| [media-controls](https://github.com/cliffniff/Media-Controls) | Currently playing media controls and info in the panel | Yes |
+| [just-perfection](https://github.com/jrahmatzadeh/just-perfection) | Tweak tool to customize Shell behavior and disable UI elements | Yes |
+
+## Code Metrics
+
+| Extension | JS Files | Total Lines | Largest File | CSS Lines | Schema Keys |
+|---|---|---|---|---|---|
+| hara-hachi-bu | 18 | 8,898 | prefs.js (1,973) | 126 | 26 |
+| tiling-shell | 1 | 14 | monitorDescription.js (14) | 0 | 61 |
+| v-shell | 28 | 19,201 | prefs.js (2,507) | 396 | 152 |
+| gsconnect | 65 | 24,680 | messaging.js (1,325) | 127 | 48 |
+| appindicator | 17 | 5,655 | appIndicator.js (1,604) | 0 | 10 |
+| clipboard-indicator | 6 | 2,486 | extension.js (1,430) | 75 | 31 |
+| blur-my-shell | 49 | 7,743 | extension.js (602) | 572 | 95 |
+| dash-to-panel | 18 | 16,583 | prefs.js (4,052) | 251 | 247 |
+| media-controls | 17 | 5,064 | PanelButton.js (1,236) | 100 | 29 |
+| just-perfection | 7 | 7,490 | API.js (3,663) | 732 | 73 |
+
+> **Note**: tiling-shell metrics reflect the compiled release zip — the TypeScript source is much larger. The high SKIP count (51) is due to checks that don't apply to bundled output.
+
+## Latest Lint Results (2026-03-07)
+
+ego-lint version: `ae650be`
+
+| Extension | Exit | PASS | FAIL | WARN | SKIP | Verdict |
+|---|---|---|---|---|---|---|
+| hara-hachi-bu | 0 | 207 | 0 | 9 | 23 | Pass |
+| tiling-shell | 1 | 138 | 4 | 4 | 51 | Fail |
+| v-shell | 1 | 188 | 1 | 91 | 17 | Fail |
+| gsconnect | 1 | 171 | 12 | 144 | 17 | Fail |
+| appindicator | 1 | 187 | 11 | 57 | 14 | Fail |
+| clipboard-indicator | 1 | 196 | 3 | 24 | 17 | Fail |
+| blur-my-shell | 1 | 190 | 4 | 43 | 17 | Fail |
+| dash-to-panel | 1 | 171 | 11 | 66 | 17 | Fail |
+| media-controls | 1 | 188 | 6 | 28 | 17 | Fail |
+| just-perfection | 1 | 198 | 4 | 10 | 12 | Fail |
+| **Totals** | — | **1,834** | **56** | **476** | **202** | — |
+
+## Annotation Coverage
+
+Each extension has a classification file in `annotations/` where findings are labeled as true positive (tp), false positive (fp), borderline, or expected.
+
+| Extension | TP | FP | Borderline | Expected | Classified | Unannotated |
+|---|---|---|---|---|---|---|
+| hara-hachi-bu | 7 | 10 | 0 | 1 | 18 | 32 |
+| tiling-shell | 13 | 7 | 4 | 0 | 24 | 59 |
+| v-shell | 8 | 5 | 1 | 0 | 14 | 48 |
+| gsconnect | 12 | 8 | 8 | 0 | 28 | 76 |
+| appindicator | 23 | 12 | 3 | 0 | 38 | 51 |
+| clipboard-indicator | 28 | 8 | 0 | 0 | 36 | 42 |
+| blur-my-shell | 23 | 11 | 4 | 0 | 38 | 50 |
+| dash-to-panel | 18 | 10 | 3 | 0 | 31 | 68 |
+| media-controls | 16 | 3 | 2 | 0 | 21 | 37 |
+| just-perfection | 2 | 0 | 0 | 0 | 2 | 26 |
+| **Totals** | **150** | **74** | **25** | **1** | **250** | **489** |
+
+## Directory Structure
+
+```
+field-tests/
+  manifest.yaml          # Extension source manifest
+  baselines/             # Golden JSON snapshots (committed)
+  annotations/           # Per-extension finding classifications (committed)
+  history.jsonl          # Append-only trend data (committed)
+  reports/               # Regression/synthesis reports (committed)
+  cache/                 # Downloaded extensions (gitignored)
+  results/               # Timestamped run output (gitignored)
+```
+
+## Quick Start
+
+```bash
+# Run lint on all extensions
+bash scripts/field-test-runner.sh
+
+# Run lint on a single extension
+bash scripts/field-test-runner.sh --extension blur-my-shell
+
+# Skip git fetches, use cached copies
+bash scripts/field-test-runner.sh --no-fetch
+
+# Update baselines after confirming results
+bash scripts/field-test-runner.sh --update-baselines
+
+# Run lint + ego-review on all extensions
+bash scripts/field-test-runner.sh --review --no-fetch
+
+# Review with exclusions and custom budget
+bash scripts/field-test-runner.sh --review --review-exclude gsconnect --budget 6.00
+```
+
+See the [calibration cycle](../CLAUDE.md#calibration-cycle) in CLAUDE.md for the full workflow.
diff --git a/skills/ego-field-test/SKILL.md b/skills/ego-field-test/SKILL.md
@@ -94,7 +94,16 @@ Produce `field-tests/reports/<date>-regression.md` with:
 5. **High-priority FP candidates**: Rules that fire as FP on 2+ approved extensions
 6. **Gaps**: Findings ego-review caught that ego-lint missed (only if `--review`)
 
-### Step 6: Issue Creation (if FPs confirmed)
+### Step 6: Update field-tests/README.md
+
+Update the "Latest Lint Results" and "Annotation Coverage" tables in `field-tests/README.md` to reflect the current run:
+
+1. **Latest Lint Results**: Replace the ego-lint version, per-extension PASS/FAIL/WARN/SKIP counts, and totals row with values from the latest `history.jsonl` entries
+2. **Annotation Coverage**: Update TP/FP/borderline/expected/classified/unannotated counts from `annotations/*.yaml` and the diff output
+
+Keep the "Extension Catalog" and "Code Metrics" sections unchanged unless a new extension was added to the manifest.
+
+### Step 7: Issue Creation (if FPs confirmed)
 
 For new false positives on EGO-approved extensions that are confirmed FP (not borderline):
 
@@ -103,7 +112,7 @@ Create a GitHub issue:
 - Title: `False positive: R-XXXX-NN on <extension>`
 - Body: Rule ID, file:line, why it's FP, which other extensions are affected, suggested fix
 
-### Step 7: Update Baselines (if `--update-baselines`)
+### Step 8: Update Baselines (if `--update-baselines`)
 
 ```bash
 bash scripts/field-test-runner.sh --update-baselines --no-fetch
@@ -148,6 +157,7 @@ findings:
 1. Make a code change (guard pattern, threshold tweak, new rule)
 2. Run `/ego-field-test` — see immediate impact across all extensions
 3. Classify new unannotated findings
-4. If FPs found, create issues and fix them
-5. Run `/ego-field-test --update-baselines` to snapshot improved state
-6. Repeat
+4. Update `field-tests/README.md` results and annotation tables
+5. If FPs found, create issues and fix them
+6. Run `/ego-field-test --update-baselines` to snapshot improved state
+7. Repeat