Skip to content

Commit 6e6c3fc

Browse files
authored
feat: add 4 agentic code quality workflows
- duplicate-code-detector.md: weekly scan for near-duplicate code blocks, files issues for high-impact deduplication opportunities - test-coverage-reporter.md: weekly + on push to main, reports coverage trends as GitHub Discussions - refactoring-scanner.md: weekly scan for oversized/mixed-responsibility files, files refactoring issues - export-audit.md: on push to main, detects unused exports, naming inconsistencies, circular deps, wrong test imports All compiled to .lock.yml via gh aw compile and post-processed. Agent-Logs-Url: https://github.com/github/gh-aw-firewall/sessions/28e98601-c6b6-425c-9141-a0d0c455ec04
1 parent 2328504 commit 6e6c3fc

8 files changed

Lines changed: 5395 additions & 0 deletions

.github/workflows/duplicate-code-detector.lock.yml

Lines changed: 1061 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Lines changed: 222 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,222 @@
1+
---
2+
description: |
3+
Weekly workflow that scans the codebase for duplicate and near-duplicate code blocks,
4+
copy-paste patterns, and repeated logic sequences in TypeScript source and JavaScript
5+
container code. Files actionable issues for high-impact deduplication opportunities
6+
to prevent technical debt from accumulating silently.
7+
8+
on:
9+
schedule: weekly
10+
workflow_dispatch:
11+
12+
permissions:
13+
contents: read
14+
issues: read
15+
16+
sandbox:
17+
agent:
18+
version: v0.25.29
19+
network:
20+
allowed:
21+
- node
22+
23+
tools:
24+
github:
25+
toolsets: [issues]
26+
bash: true
27+
28+
safe-outputs:
29+
threat-detection:
30+
enabled: false
31+
create-issue:
32+
title-prefix: "[Duplicate Code] "
33+
labels: [code-quality, refactoring]
34+
max: 5
35+
expires: 30d
36+
37+
timeout-minutes: 20
38+
---
39+
40+
# Duplicate Code Detector
41+
42+
You are a code quality engineer analyzing the `${{ github.repository }}` codebase for duplicated and near-duplicate code. Your mission is to surface high-impact deduplication opportunities that will reduce maintenance burden and improve consistency.
43+
44+
## Repository Context
45+
46+
This is **gh-aw-firewall**, a network firewall for GitHub Copilot CLI. The most important source files for duplication analysis are:
47+
48+
- `src/docker-manager.ts` — 3,900+ lines; container lifecycle, env-var construction, volume mounts
49+
- `src/cli.ts` — 1,700+ lines; argument parsing, orchestration, config merging
50+
- `containers/api-proxy/server.js` — provider-agnostic proxy server
51+
- `containers/api-proxy/providers/*.js` — per-provider adapter modules
52+
53+
## Phase 1: Gather Codebase Metrics
54+
55+
Run these commands to understand the scope before diving into duplication:
56+
57+
```bash
58+
# File sizes and line counts
59+
wc -l src/*.ts src/**/*.ts containers/api-proxy/*.js containers/api-proxy/providers/*.js 2>/dev/null | sort -rn | head -30
60+
61+
# Total files and lines
62+
echo "=== TypeScript source ==="
63+
find src -name "*.ts" ! -name "*.test.ts" | xargs wc -l 2>/dev/null | sort -rn | head -20
64+
echo "=== Container JS ==="
65+
find containers -name "*.js" | xargs wc -l 2>/dev/null | sort -rn | head -20
66+
```
67+
68+
## Phase 2: Detect Structural Duplication
69+
70+
Install and run the `jscpd` (JavaScript Copy/Paste Detector) tool to find literal code duplication:
71+
72+
```bash
73+
# Install jscpd
74+
npm install -g jscpd 2>&1 | tail -3
75+
76+
# Run duplicate detection on TypeScript source
77+
jscpd src --min-lines 10 --min-tokens 50 --reporters json --output /tmp/jscpd-src 2>&1 | tail -20
78+
79+
# Run on container JS
80+
jscpd containers --min-lines 10 --min-tokens 50 --reporters json --output /tmp/jscpd-containers 2>&1 | tail -20
81+
82+
# Show summary
83+
cat /tmp/jscpd-src/jscpd-report.json 2>/dev/null | node -e "
84+
const d = JSON.parse(require('fs').readFileSync('/dev/stdin', 'utf8'));
85+
const clones = d.duplicates || [];
86+
console.log('Total duplicates found:', clones.length);
87+
clones.slice(0, 10).forEach(c => {
88+
const f1 = c.firstFile?.name?.replace(process.cwd() + '/', '') || 'unknown';
89+
const f2 = c.secondFile?.name?.replace(process.cwd() + '/', '') || 'unknown';
90+
console.log(\` \${f1}:\${c.firstFile?.start}-\${c.firstFile?.end} <-> \${f2}:\${c.secondFile?.start}-\${c.secondFile?.end} (\${c.fragment?.split('\\n').length || 0} lines)\`);
91+
});
92+
" || echo "(jscpd report not available)"
93+
```
94+
95+
## Phase 3: Detect Pattern-Level Duplication
96+
97+
Use grep to find repeated code patterns that jscpd may not catch (semantic duplication):
98+
99+
```bash
100+
echo "=== Env-var reading/trimming patterns ==="
101+
grep -rn "process\.env\." src/ --include="*.ts" | grep -v "test" | head -40
102+
103+
echo "=== Docker exec/run command construction patterns ==="
104+
grep -n "execa\|execaSync\|docker.*run\|docker.*exec" src/docker-manager.ts | head -30
105+
106+
echo "=== Config/validation patterns in config-file.ts and schema-validator.ts ==="
107+
grep -n "throw\|error\|invalid\|validate" src/config-file.ts | head -20
108+
grep -n "throw\|error\|invalid\|validate" src/schema-validator.ts 2>/dev/null | head -20
109+
110+
echo "=== Repeated try/catch error handling patterns ==="
111+
grep -n -A 3 "catch (e" src/docker-manager.ts | head -60
112+
113+
echo "=== Provider adapter patterns in api-proxy ==="
114+
for f in containers/api-proxy/providers/*.js; do
115+
echo "--- $f ---"
116+
grep -n "function\|const.*=.*(" "$f" | head -10
117+
done
118+
119+
echo "=== Repeated log construction patterns ==="
120+
grep -rn "logger\.\(debug\|info\|warn\|error\)" src/ --include="*.ts" | \
121+
sed 's/.*logger\.\(debug\|info\|warn\|error\)(\(.*\))/\2/' | \
122+
sort | uniq -d | head -20
123+
```
124+
125+
## Phase 4: Analyze Specific Known Duplication Areas
126+
127+
Based on codebase knowledge, deeply analyze the most likely duplication hotspots:
128+
129+
```bash
130+
echo "=== docker-manager.ts: env-var construction ==="
131+
grep -n "env\[.*\]\s*=\|envVars\.\|\.trim()\|process\.env\." src/docker-manager.ts | head -50
132+
133+
echo "=== docker-manager.ts: repeated docker compose args patterns ==="
134+
grep -n "composeArgs\|dockerArgs\|\-f.*compose\|--project-name" src/docker-manager.ts | head -30
135+
136+
echo "=== cli.ts: option handling patterns ==="
137+
grep -n "\.option\|options\.\|program\." src/cli.ts | head -50
138+
139+
echo "=== API proxy provider similarity (getConfig patterns) ==="
140+
for f in containers/api-proxy/providers/openai.js containers/api-proxy/providers/anthropic.js containers/api-proxy/providers/gemini.js containers/api-proxy/providers/copilot.js containers/api-proxy/providers/opencode.js; do
141+
if [ -f "$f" ]; then
142+
echo "--- $f: exported functions ---"
143+
grep -n "^function\|^const.*=\s*function\|^module\.exports\|^exports\." "$f" | head -10
144+
fi
145+
done
146+
147+
echo "=== proxy-utils.js: shared utilities ==="
148+
cat containers/api-proxy/proxy-utils.js 2>/dev/null | head -60
149+
```
150+
151+
## Phase 5: Check for Existing Issues
152+
153+
Before filing new issues, check what's already been reported:
154+
155+
1. Search for open issues with `[Duplicate Code]` prefix using the GitHub toolset
156+
2. Also search for issues with labels `code-quality` or `refactoring` that describe duplication
157+
3. Skip any finding that already has an open tracking issue
158+
159+
## Phase 6: Prioritize and Report Findings
160+
161+
Based on your analysis, identify the **top duplications by impact** using this scoring:
162+
163+
| Factor | Points |
164+
|--------|--------|
165+
| >20 duplicate lines | +3 |
166+
| Affects security-critical path | +3 |
167+
| In file >1000 lines (maintenance burden) | +2 |
168+
| More than 2 copies | +2 |
169+
| Easy to extract (no complex dependencies) | +1 |
170+
171+
Report only findings with score ≥ 4.
172+
173+
### For each high-impact finding, create an issue with this format:
174+
175+
**Title**: `[Duplicate Code] <brief description of what is duplicated>`
176+
177+
**Body**:
178+
```markdown
179+
## Duplicate Code Opportunity
180+
181+
### Summary
182+
- **Pattern**: Brief description of what is being duplicated
183+
- **Locations**: File(s) and line ranges containing duplicates
184+
- **Impact**: Lines saved / maintenance burden reduction
185+
186+
### Evidence
187+
188+
<Show the specific duplicated code blocks side by side>
189+
190+
### Suggested Refactoring
191+
192+
Describe the shared utility or abstraction that would eliminate the duplication.
193+
For example:
194+
- Extract a `parseEnvVars(obj)` helper in `src/env-utils.ts`
195+
- Create a base class or mixin for provider adapters
196+
- Add a `buildDockerArgs(config)` factory function
197+
198+
### Affected Files
199+
- `path/to/file.ts` — lines X-Y
200+
- `path/to/other.ts` — lines A-B
201+
202+
### Effort Estimate
203+
Low / Medium / High
204+
205+
---
206+
*Detected by Duplicate Code Detector workflow. Run date: $(date -u +"%Y-%m-%d")*
207+
```
208+
209+
## Guidelines
210+
211+
- **Be specific**: Always include file paths and line numbers in the evidence section
212+
- **Be actionable**: Each issue should have a clear, implementable suggestion
213+
- **Avoid noise**: Only file issues for genuine duplication with real maintenance impact — not cosmetic similarities
214+
- **No duplicates**: Check existing open issues before creating new ones
215+
- **Security awareness**: Flag duplicated security-critical logic (domain validation, ACL rules, capability management) with higher urgency
216+
- **Cap at 5 issues**: File at most 5 issues per run to avoid flooding the tracker
217+
218+
## Edge Cases
219+
220+
- **No significant duplication found**: Exit gracefully without creating issues; print a summary to the log
221+
- **jscpd unavailable**: Fall back to grep-based pattern analysis only
222+
- **All findings already tracked**: Skip creation and log that existing issues cover the findings

0 commit comments

Comments
 (0)