Skip to content

Commit 22fc27c

Browse files
ZviBaratzclaude
andauthored
docs: add field-tests/README.md with extension catalog and results (#67)
## Summary - Add `field-tests/README.md` with a browsable overview of the 10 field test extensions: catalog with descriptions, code metrics, latest lint results (PASS/FAIL/WARN/SKIP), and annotation coverage (TP/FP/borderline/unannotated) - Update `ego-field-test` SKILL.md to include a new Step 6 that keeps the README tables current after each field test run ## Test plan - [x] Verify `field-tests/README.md` renders correctly on GitHub - [x] Verify lint results match latest `history.jsonl` entries (2026-03-07 run) - [x] Verify annotation counts match `annotations/*.yaml` files - [x] Run `bash tests/run-tests.sh` to confirm no regressions Closes #66 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 8263cbc commit 22fc27c

File tree

4 files changed

+125
-6
lines changed

4 files changed

+125
-6
lines changed

.github/workflows/lint-pr.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ jobs:
3535
ego-scaffold
3636
ego-simulate
3737
ego-submit
38+
ego-field-test
3839
requireScope: false
3940
subjectPattern: ^[a-z].+$
4041
subjectPatternError: |

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -184,7 +184,7 @@ Key unwritten rules discovered:
184184
7. Subprocesses must have cancellation path in `disable()`
185185
8. No `convenience.js` patterns
186186

187-
Full research: [docs/research/](docs/research/) | Coverage gaps: [docs/research/gap-analysis.md](docs/research/gap-analysis.md) | Field testing: [docs/internal/](docs/internal/) — 7 real-world extensions tested with TP/FP classification and calibration lessons
187+
Full research: [docs/research/](docs/research/) | Coverage gaps: [docs/research/gap-analysis.md](docs/research/gap-analysis.md) | Field testing: [field-tests/](field-tests/) — 10 real-world extensions tested with TP/FP classification and calibration lessons
188188

189189
## 🗺️ Roadmap
190190

field-tests/README.md

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
# Field Tests
2+
3+
Batch ego-lint runner for regression testing across 10 real-world GNOME Shell extensions. Used to calibrate rules, catch false positives, and track lint accuracy over time.
4+
5+
## Extension Catalog
6+
7+
| Extension | Description | EGO Approved |
8+
|---|---|---|
9+
| [hara-hachi-bu](https://github.com/ZviBaratz/hara-hachi-bu) | Power profile and battery charge limit control from Quick Settings (polkit, clipboard) | No |
10+
| [tiling-shell](https://github.com/domferr/tilingshell) | Advanced tiling window management with Snap Assistant and custom layouts (compiled TypeScript) | Yes |
11+
| [v-shell](https://github.com/G-dH/vertical-workspaces) | Customizable horizontal/vertical workspace layout and Shell UX tweaks | Yes |
12+
| [gsconnect](https://github.com/GSConnect/gnome-shell-extension-gsconnect) | KDE Connect implementation for GNOME — device sharing, SMS, remote control (D-Bus daemon) | Yes |
13+
| [appindicator](https://github.com/ubuntu/gnome-shell-extension-appindicator) | AppIndicator, KStatusNotifierItem, and legacy tray icon support | Yes |
14+
| [clipboard-indicator](https://github.com/Tudmotu/gnome-shell-extension-clipboard-indicator) | Clipboard manager with history | Yes |
15+
| [blur-my-shell](https://github.com/aunetx/blur-my-shell) | Blur effects for top panel, dash, and overview | Yes |
16+
| [dash-to-panel](https://github.com/home-sweet-gnome/dash-to-panel) | Icon taskbar combining dash and system tray into the main panel | Yes |
17+
| [media-controls](https://github.com/cliffniff/Media-Controls) | Currently playing media controls and info in the panel | Yes |
18+
| [just-perfection](https://github.com/jrahmatzadeh/just-perfection) | Tweak tool to customize Shell behavior and disable UI elements | Yes |
19+
20+
## Code Metrics
21+
22+
| Extension | JS Files | Total Lines | Largest File | CSS Lines | Schema Keys |
23+
|---|---|---|---|---|---|
24+
| hara-hachi-bu | 18 | 8,898 | prefs.js (1,973) | 126 | 26 |
25+
| tiling-shell | 1 | 14 | monitorDescription.js (14) | 0 | 61 |
26+
| v-shell | 28 | 19,201 | prefs.js (2,507) | 396 | 152 |
27+
| gsconnect | 65 | 24,680 | messaging.js (1,325) | 127 | 48 |
28+
| appindicator | 17 | 5,655 | appIndicator.js (1,604) | 0 | 10 |
29+
| clipboard-indicator | 6 | 2,486 | extension.js (1,430) | 75 | 31 |
30+
| blur-my-shell | 49 | 7,743 | extension.js (602) | 572 | 95 |
31+
| dash-to-panel | 18 | 16,583 | prefs.js (4,052) | 251 | 247 |
32+
| media-controls | 17 | 5,064 | PanelButton.js (1,236) | 100 | 29 |
33+
| just-perfection | 7 | 7,490 | API.js (3,663) | 732 | 73 |
34+
35+
> **Note**: tiling-shell metrics reflect the compiled release zip — the TypeScript source is much larger. The high SKIP count (51) is due to checks that don't apply to bundled output.
36+
37+
## Latest Lint Results (2026-03-07)
38+
39+
ego-lint version: `ae650be`
40+
41+
| Extension | Exit | PASS | FAIL | WARN | SKIP | Verdict |
42+
|---|---|---|---|---|---|---|
43+
| hara-hachi-bu | 0 | 207 | 0 | 9 | 23 | Pass |
44+
| tiling-shell | 1 | 138 | 4 | 4 | 51 | Fail |
45+
| v-shell | 1 | 188 | 1 | 91 | 17 | Fail |
46+
| gsconnect | 1 | 171 | 12 | 144 | 17 | Fail |
47+
| appindicator | 1 | 187 | 11 | 57 | 14 | Fail |
48+
| clipboard-indicator | 1 | 196 | 3 | 24 | 17 | Fail |
49+
| blur-my-shell | 1 | 190 | 4 | 43 | 17 | Fail |
50+
| dash-to-panel | 1 | 171 | 11 | 66 | 17 | Fail |
51+
| media-controls | 1 | 188 | 6 | 28 | 17 | Fail |
52+
| just-perfection | 1 | 198 | 4 | 10 | 12 | Fail |
53+
| **Totals** || **1,834** | **56** | **476** | **202** ||
54+
55+
## Annotation Coverage
56+
57+
Each extension has a classification file in `annotations/` where findings are labeled as true positive (tp), false positive (fp), borderline, or expected.
58+
59+
| Extension | TP | FP | Borderline | Expected | Classified | Unannotated |
60+
|---|---|---|---|---|---|---|
61+
| hara-hachi-bu | 7 | 10 | 0 | 1 | 18 | 32 |
62+
| tiling-shell | 13 | 7 | 4 | 0 | 24 | 59 |
63+
| v-shell | 8 | 5 | 1 | 0 | 14 | 48 |
64+
| gsconnect | 12 | 8 | 8 | 0 | 28 | 76 |
65+
| appindicator | 23 | 12 | 3 | 0 | 38 | 51 |
66+
| clipboard-indicator | 28 | 8 | 0 | 0 | 36 | 42 |
67+
| blur-my-shell | 23 | 11 | 4 | 0 | 38 | 50 |
68+
| dash-to-panel | 18 | 10 | 3 | 0 | 31 | 68 |
69+
| media-controls | 16 | 3 | 2 | 0 | 21 | 37 |
70+
| just-perfection | 2 | 0 | 0 | 0 | 2 | 26 |
71+
| **Totals** | **150** | **74** | **25** | **1** | **250** | **489** |
72+
73+
## Directory Structure
74+
75+
```
76+
field-tests/
77+
manifest.yaml # Extension source manifest
78+
baselines/ # Golden JSON snapshots (committed)
79+
annotations/ # Per-extension finding classifications (committed)
80+
history.jsonl # Append-only trend data (committed)
81+
reports/ # Regression/synthesis reports (committed)
82+
cache/ # Downloaded extensions (gitignored)
83+
results/ # Timestamped run output (gitignored)
84+
```
85+
86+
## Quick Start
87+
88+
```bash
89+
# Run lint on all extensions
90+
bash scripts/field-test-runner.sh
91+
92+
# Run lint on a single extension
93+
bash scripts/field-test-runner.sh --extension blur-my-shell
94+
95+
# Skip git fetches, use cached copies
96+
bash scripts/field-test-runner.sh --no-fetch
97+
98+
# Update baselines after confirming results
99+
bash scripts/field-test-runner.sh --update-baselines
100+
101+
# Run lint + ego-review on all extensions
102+
bash scripts/field-test-runner.sh --review --no-fetch
103+
104+
# Review with exclusions and custom budget
105+
bash scripts/field-test-runner.sh --review --review-exclude gsconnect --budget 6.00
106+
```
107+
108+
See the [calibration cycle](../CLAUDE.md#calibration-cycle) in CLAUDE.md for the full workflow.

skills/ego-field-test/SKILL.md

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,16 @@ Produce `field-tests/reports/<date>-regression.md` with:
9494
5. **High-priority FP candidates**: Rules that fire as FP on 2+ approved extensions
9595
6. **Gaps**: Findings ego-review caught that ego-lint missed (only if `--review`)
9696

97-
### Step 6: Issue Creation (if FPs confirmed)
97+
### Step 6: Update field-tests/README.md
98+
99+
Update the "Latest Lint Results" and "Annotation Coverage" tables in `field-tests/README.md` to reflect the current run:
100+
101+
1. **Latest Lint Results**: Replace the ego-lint version, per-extension PASS/FAIL/WARN/SKIP counts, and totals row with values from the latest `history.jsonl` entries
102+
2. **Annotation Coverage**: Update TP/FP/borderline/expected/classified/unannotated counts from `annotations/*.yaml` and the diff output
103+
104+
Keep the "Extension Catalog" and "Code Metrics" sections unchanged unless a new extension was added to the manifest.
105+
106+
### Step 7: Issue Creation (if FPs confirmed)
98107

99108
For new false positives on EGO-approved extensions that are confirmed FP (not borderline):
100109

@@ -103,7 +112,7 @@ Create a GitHub issue:
103112
- Title: `False positive: R-XXXX-NN on <extension>`
104113
- Body: Rule ID, file:line, why it's FP, which other extensions are affected, suggested fix
105114

106-
### Step 7: Update Baselines (if `--update-baselines`)
115+
### Step 8: Update Baselines (if `--update-baselines`)
107116

108117
```bash
109118
bash scripts/field-test-runner.sh --update-baselines --no-fetch
@@ -148,6 +157,7 @@ findings:
148157
1. Make a code change (guard pattern, threshold tweak, new rule)
149158
2. Run `/ego-field-test` — see immediate impact across all extensions
150159
3. Classify new unannotated findings
151-
4. If FPs found, create issues and fix them
152-
5. Run `/ego-field-test --update-baselines` to snapshot improved state
153-
6. Repeat
160+
4. Update `field-tests/README.md` results and annotation tables
161+
5. If FPs found, create issues and fix them
162+
6. Run `/ego-field-test --update-baselines` to snapshot improved state
163+
7. Repeat

0 commit comments

Comments
 (0)