Skip to content

Commit d471b17

Browse files
zhmiaoCopilot
andcommitted
chore(repo-hygiene): enforce dev-artifact denylist + migrate rp11 handoff + add MDv6 example
Three coupled changes to keep the public PW (sparrow-engine code) repo free of internal dev/AI narrative artifacts: 1. Remove leaked dev artifact: - docs/rp11_handoff.md (introduced in PW SHA b8c2158) is an /implement skill handoff doc that should never have landed in this repo. Migrated verbatim to the internal sparrow-engine-dev companion at docs/implement/phase-e-nvjpeg-dlopen/, alongside a provenance note citing the original PW SHA. 2. Add tracked pre-commit hook to prevent recurrence: - .githooks/pre-commit — small whitelist (user-manual.md, install.md, README.md, docs/images/, docs/assets/) plus a broad denylist (docs/{design,research,review,implement, explain,tech_report,changelog}/**; top-level dev files like lessons.md / master_plan.md / ideas.md; handoff/report/round artifacts anywhere; prompt_logs/**; agent-instruction dumps; SESSION_LEDGER + COVERAGE_LOG + completion-sentinel + MEMORY.md; migration-provenance markers). - .githooks/install.sh — one-time activator that sets core.hooksPath=.githooks and makes the hook scripts executable. - Smoke-tested 18/18 cases (14 denylist blocks + 4 whitelist passes). - Escape hatch: git commit --no-verify for justified exceptions. 3. Add user-manual section 6.6 — runnable MegaDetector v6 example using the Python wheel. Verified against the actual API (DetectResult shape, list[tuple[path, result]] for visualize + export, export accepted-values list, default model dir ~/.sparrow-engine/models/). All cited line numbers are live as of this commit. Rationale: the global instructions already documented the sparrow-x-dev / sparrow-x split rule; one leak slipped through because the agent doc-rule alone was not enforceable at commit time. The tracked hook closes that loophole. After fresh clone, contributors run once: bash .githooks/install.sh Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 5c7b5f1 commit d471b17

4 files changed

Lines changed: 314 additions & 120 deletions

File tree

.githooks/install.sh

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
#!/usr/bin/env bash
2+
# .githooks/install.sh — One-time activation of repo-tracked git hooks.
3+
#
4+
# Why: a tracked hook file (`.githooks/pre-commit`) does nothing until git
5+
# is told to look in `.githooks/` instead of the per-clone `.git/hooks/`.
6+
# This script sets that config in the local clone (no commit needed).
7+
#
8+
# What: runs `git config core.hooksPath .githooks` for the current clone
9+
# and reports the current hook source so you can verify it took effect.
10+
#
11+
# How: run once after `git clone`. Idempotent — safe to re-run.
12+
13+
set -euo pipefail
14+
15+
cd "$(git rev-parse --show-toplevel)"
16+
17+
echo "[install-hooks] setting core.hooksPath = .githooks ..."
18+
git config core.hooksPath .githooks
19+
20+
# Make every hook in .githooks/ executable.
21+
echo "[install-hooks] chmod +x .githooks/*"
22+
chmod +x .githooks/*
23+
24+
echo
25+
echo "[install-hooks] done. Active hook source:"
26+
echo " $(git config --get core.hooksPath)"
27+
echo
28+
echo "[install-hooks] Available hooks:"
29+
for h in .githooks/*; do
30+
[[ -f "$h" && -x "$h" && "$(basename "$h")" != "install.sh" ]] || continue
31+
printf ' %s\n' "$(basename "$h")"
32+
done
33+
echo
34+
echo "[install-hooks] To bypass a hook for one commit (use sparingly):"
35+
echo " git commit --no-verify -m '...'"

.githooks/pre-commit

Lines changed: 215 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,215 @@
1+
#!/usr/bin/env bash
2+
# .githooks/pre-commit — Block dev/AI artifacts from leaking into the public
3+
# Pytorch-Wildlife (sparrow-engine code) repo.
4+
#
5+
# Why: per sparrow-engine-dev/docs/rules.md § Dev-companion split, dev/AI
6+
# narrative artifacts (audit-fix rounds, design rounds, /implement skill
7+
# outputs, prompt logs, scope ledgers, etc.) live in the internal
8+
# `sparrow-engine-dev` companion repo, NOT here. One historical leak
9+
# (`docs/rp11_handoff.md` in commit b8c2158) prompted this enforcement.
10+
#
11+
# What: scans the staged file list. If any path matches a denylist pattern
12+
# AND is not on the small whitelist, the commit is rejected with a pointer
13+
# to the rule + the migration path.
14+
#
15+
# How: install with `bash .githooks/install.sh` (one-time per fresh clone).
16+
# That sets `git config core.hooksPath .githooks` so this script runs.
17+
#
18+
# Escape hatches:
19+
# - Stage with `--no-verify` to bypass (use ONLY for explicit, justified
20+
# exceptions; do NOT use to land routine dev artifacts).
21+
# - To allow a new public-docs file, add it to the WHITELIST_FILES /
22+
# WHITELIST_DIRS arrays below and document the rationale in the commit
23+
# message.
24+
set -euo pipefail
25+
26+
# -----------------------------------------------------------------------------
27+
# Allow-list (always permitted)
28+
# -----------------------------------------------------------------------------
29+
WHITELIST_FILES=(
30+
"docs/user-manual.md"
31+
"docs/install.md"
32+
"docs/README.md"
33+
)
34+
35+
# Directory prefixes whose contents are always allowed (relative to repo root,
36+
# no leading slash). Add only public-docs subdirs here.
37+
WHITELIST_DIRS=(
38+
"docs/images/"
39+
"docs/assets/"
40+
)
41+
42+
# -----------------------------------------------------------------------------
43+
# Deny-list patterns (any match fails the commit unless allow-listed above)
44+
#
45+
# Glob-style; matched against `git diff --cached --name-only` (forward slashes,
46+
# repo-relative). Use `*` for any chars in one segment, `**` for any number of
47+
# segments.
48+
# -----------------------------------------------------------------------------
49+
DENYLIST_PATTERNS=(
50+
# docs/ dev-narrative subtrees
51+
"docs/design/**"
52+
"docs/research/**"
53+
"docs/review/**"
54+
"docs/implement/**"
55+
"docs/implementation/**"
56+
"docs/explain/**"
57+
"docs/tech_report/**"
58+
"docs/changelog/**"
59+
60+
# docs/ top-level dev-narrative files
61+
"docs/plan.md"
62+
"docs/plan*.md"
63+
"docs/master_plan.md"
64+
"docs/master_plan*.md"
65+
"docs/changelog.md"
66+
"docs/changelog*.md"
67+
"docs/lessons.md"
68+
"docs/bugs.md"
69+
"docs/ideas.md"
70+
"docs/decisions.md"
71+
"docs/rules.md"
72+
"docs/benchmarks.md"
73+
74+
# Handoff / report / round artifacts anywhere in the tree
75+
"**/*_handoff.md"
76+
"**/*-handoff.md"
77+
"**/*_report.md"
78+
"**/*-report.md"
79+
"**/round_*/**"
80+
81+
# Skill artifacts
82+
"**/SESSION_LEDGER*.json"
83+
"**/COVERAGE_LOG*.jsonl"
84+
"**/CONVERGED.sentinel"
85+
"**/MEMORY.md"
86+
"**/scope_check.sh"
87+
88+
# Agent instruction dumps
89+
"CLAUDE.md"
90+
"AGENTS.md"
91+
"**/CLAUDE.md"
92+
"**/AGENTS.md"
93+
94+
# Prompt logs
95+
"prompt_logs/**"
96+
"**/prompt_logs/**"
97+
98+
# Audit-fix / doc-fix marker files
99+
"**/MIGRATED_FROM_PUBLIC.md"
100+
)
101+
102+
# -----------------------------------------------------------------------------
103+
# Helpers
104+
# -----------------------------------------------------------------------------
105+
# Convert a glob-style pattern (with `**`) into an extended regex.
106+
# `**` → `.*`, `*` → `[^/]*`, literal dots escaped.
107+
glob_to_regex() {
108+
local pat="$1"
109+
# Escape regex specials except * and / (we transform * after).
110+
# NOTE: do NOT use ${pat//?/...} — bash treats `?` as a glob (any single
111+
# char) in the pattern slot, which would replace every character.
112+
pat="${pat//./\\.}"
113+
pat="${pat//+/\\+}"
114+
# `**` placeholder so the single-* pass doesn't eat it.
115+
pat="${pat//\*\*/__DOUBLE_STAR__}"
116+
pat="${pat//\*/[^/]*}"
117+
pat="${pat//__DOUBLE_STAR__/.*}"
118+
echo "^${pat}\$"
119+
}
120+
121+
is_whitelisted() {
122+
local path="$1"
123+
local w
124+
for w in "${WHITELIST_FILES[@]}"; do
125+
[[ "$path" == "$w" ]] && return 0
126+
done
127+
for w in "${WHITELIST_DIRS[@]}"; do
128+
[[ "$path" == "$w"* ]] && return 0
129+
done
130+
return 1
131+
}
132+
133+
matches_denylist() {
134+
local path="$1"
135+
local pat regex
136+
for pat in "${DENYLIST_PATTERNS[@]}"; do
137+
regex="$(glob_to_regex "$pat")"
138+
if [[ "$path" =~ $regex ]]; then
139+
return 0
140+
fi
141+
done
142+
return 1
143+
}
144+
145+
# -----------------------------------------------------------------------------
146+
# Main
147+
# -----------------------------------------------------------------------------
148+
# Initial commit safety: HEAD may not exist.
149+
if git rev-parse --verify HEAD >/dev/null 2>&1; then
150+
DIFF_TARGET="HEAD"
151+
else
152+
DIFF_TARGET="--cached" # falls back to staged tree
153+
fi
154+
155+
# Get list of staged files (added/modified/renamed; diff-filter excludes deleted).
156+
mapfile -t STAGED < <(git diff --cached --name-only --diff-filter=ACMR ${DIFF_TARGET:+--ignore-submodules} 2>/dev/null || true)
157+
158+
if (( ${#STAGED[@]} == 0 )); then
159+
exit 0
160+
fi
161+
162+
VIOLATIONS=()
163+
for path in "${STAGED[@]}"; do
164+
[[ -z "$path" ]] && continue
165+
if is_whitelisted "$path"; then
166+
continue
167+
fi
168+
if matches_denylist "$path"; then
169+
VIOLATIONS+=("$path")
170+
fi
171+
done
172+
173+
if (( ${#VIOLATIONS[@]} > 0 )); then
174+
cat >&2 <<'EOF'
175+
176+
╔══════════════════════════════════════════════════════════════════════════════╗
177+
║ COMMIT BLOCKED — dev/AI artifact detected ║
178+
╚══════════════════════════════════════════════════════════════════════════════╝
179+
180+
The Pytorch-Wildlife repo (sparrow-engine code) MUST NOT carry dev/AI
181+
narrative artifacts. They belong in the internal `sparrow-engine-dev`
182+
companion repo (typically at ../sparrow-engine-dev or via $SPARROW_X_DEV).
183+
184+
Files blocked by this hook:
185+
EOF
186+
for v in "${VIOLATIONS[@]}"; do
187+
printf ' • %s\n' "$v" >&2
188+
done
189+
cat >&2 <<'EOF'
190+
191+
What to do:
192+
193+
1. Move each file into the sparrow-engine-dev companion under the
194+
equivalent docs/ subpath (or docs/implement/<phase>/ for implement
195+
skill artifacts).
196+
2. Drop a MIGRATED_FROM_PUBLIC.md provenance note in the destination
197+
directory citing this would-be PW commit's intended SHA + subject.
198+
3. `git restore --staged <path>` to unstage, then `git rm` if the file
199+
was already committed.
200+
4. Re-run `git commit`.
201+
202+
Rule citation: sparrow-engine-dev/docs/rules.md § Dev-companion split
203+
Hook source: .githooks/pre-commit (this file)
204+
205+
If you have a justified exception (e.g., a new genuinely-public docs file),
206+
either:
207+
a) Add the path to WHITELIST_FILES / WHITELIST_DIRS in this hook and
208+
commit the change with a justification, or
209+
b) Bypass once with `git commit --no-verify` (logs the exception in
210+
your reflog).
211+
EOF
212+
exit 1
213+
fi
214+
215+
exit 0

docs/rp11_handoff.md

Lines changed: 0 additions & 120 deletions
This file was deleted.

0 commit comments

Comments
 (0)