Skip to content

feat: Mode B codebase ingestion, drift lint, and commit-triggered sync#81

Open
RasenGUY wants to merge 1 commit into
AgriciDaniel:mainfrom
RasenGUY:feat/mode-b-code-ingest
Open

feat: Mode B codebase ingestion, drift lint, and commit-triggered sync#81
RasenGUY wants to merge 1 commit into
AgriciDaniel:mainfrom
RasenGUY:feat/mode-b-code-ingest

Conversation

@RasenGUY

@RasenGUY RasenGUY commented Jun 9, 2026

Copy link
Copy Markdown

Summary

Makes Mode B (GitHub / Repository) a real, end-to-end feature instead of a folder template. A codebase can now be ingested into the wiki as modules / flows / dependencies / decisions pages from deterministic signals (gitignore-aware repo walk, parsed dependency manifests, git anchors, regex import edges); wiki-lint can detect drift when a page's source changes via git content-hash anchors; and a watched repo can be kept in sync on every commit (enqueue-only git hook → in-session drain by default, opt-in autonomous headless drain). Language-agnostic by construction (git content-addressing + extension maps), additive, and backward-compatible — no change to prose ingestion. Until now Mode B was a folder convention filled in by hand (skills/wiki/references/modes.md §Mode B); this wires it into the router, the schema, the lint, and the hooks.

Type

  • Bug fix (fix:)
  • New feature (feat:)
  • Documentation (docs:)
  • Refactor (refactor:)
  • Test coverage (test:)
  • Chore / build / maintenance (chore:)

Related issue

None yet. The work arrived complete, so it skipped the issue-first flow in CONTRIBUTING §1 — happy to open a feature-request issue retroactively if you'd prefer it tracked there.

Changes

Code ingestion (new)

  • skills/wiki-code-ingest/SKILL.md, agents/wiki-code-ingest.md, commands/wiki-code-ingest.md — guided ingest skill (mirrors wiki-ingest structure) + one-page-per-module parallel sub-agent + slash command. Whole-repo bootstrap, single-path ingest, and --sync modes.
  • scripts/code-scan.py — tree / language / LOC snapshot. Gitignore guarantee: enumerates via git ls-files --cached --others --exclude-standard; ignored/vendored/build-output files are never indexed.
  • scripts/code-manifests.py — detect + parse npm/pypi/go/cargo/rubygems/composer manifests (stdlib only; drawn from the gitignore-aware set, so no vendored manifests).
  • scripts/code-signals.py — git anchors (blob/tree SHA per path via ls-tree), churn, best-effort regex import edges. Degrades cleanly on non-git dirs.

Router + schema + drift lint

  • scripts/wiki-mode.pymodule|component|dependency|flow|decision added to VALID_TYPES, the generic DEFAULT_CONFIG folders, and the generic + PARA routing maps (LYT/Zettelkasten route them automatically). Mirrored in skills/wiki-mode/SKILL.md.
  • skills/wiki/references/frontmatter.md + references/modes.md — code-page schema: source_type: code, source_paths, code_anchors (flat "path@sha" strings — honors the no-nested-YAML rule Automatize the learning? #1), ingest_commit, ingested_at.
  • scripts/code-anchor-check.py + skills/wiki-lint/SKILL.md (check setup-vault.sh doesn't move commands/, skills/, hooks/ to .claude/ directory #11) + agents/wiki-lint.md (check Remove development leftovers and reset seed vault #9) — read-only git drift lint. Flags DRIFTED / MOVED / UNTRACKED pages; exits 10/11 to skip cleanly when git/repo absent (drift itself is a finding at exit 0, mirroring tiling).

Commit-triggered auto-sync

  • bin/setup-code-watch.sh — installs enqueue-only post-commit / post-merge / post-rewrite hooks into a watched repo (chains existing hooks; refuses the vault's own repo to prevent a commit→sync→commit loop).
  • scripts/code-sync-check.py — queue/state manager + SessionStart surfacer. Untrusted git strings sanitized via safe_display() before reaching session context.
  • bin/code-sync-launch.sh — opt-in debounced, single-flighted, detached headless claude -p drain. Requires ANTHROPIC_API_KEY; degrades to enqueue-only without it.
  • hooks/hooks.json + hooks/README.md + skills/wiki/SKILL.md — new SessionStart hook (no-op unless a sync queue exists) + routing + docs.

Tests / build

  • 6 new hermetic suites + Makefile targets; .gitignore rules for the new runtime artifacts.

Six-cut self-review

  • Read every file before changing it
  • New identifiers named for the next reader
  • Smallest unit that works (guided ingest = deterministic signals + LLM synthesis; no AST/tree-sitter, no parser deps)
  • Deletions kept up with additions where applicable (purely additive; nothing superseded)
  • New behavior has hermetic test coverage
  • New failure modes have explicit handling + undo plan (documented exit codes; --unwatch; loop guard; ANTHROPIC_API_KEY/claude/non-git degradation)

Testing

make test
All tests passed.

15 hermetic suites (9 pre-existing + 6 new: test-code-scan, -manifests, -signals, -code-drift, -code-sync, -code-watch). All hermetic — temp git repos, no network/ollama/Anthropic API. The auto-sync test drives a real git commit → queue enqueue end-to-end and asserts existing-hook chaining, the loop guard, and --unwatch cleanup. Each new signal/drift behavior (incl. the gitignore-exclusion guarantee) has a dedicated assertion.

Verifier

Dispatched agents/verifier.md on the staged diff (two passes).

  • Verdict: HOLD-FIX-FIRSTBLOCKER: 0 / HIGH: 1 / MEDIUM: 0 / LOW: 2. All findings were resolved in this branch before opening:
    • HIGH — the new runtime artifacts (.vault-meta/code-sync-queue.jsonl, code-sync-state.json, .code-sync*.lock, .raw/code/) weren't gitignored, so the existing PostToolUse vault auto-commit (git add -- .vault-meta/ .raw/) would have committed them every interaction. Fixed: added a .gitignore block (verified each path with git check-ignore).
    • LOW ×2 — clarity comments (post-finally variable use in action_unregister; trust note on $REPO/$HEAD interpolation in code-sync-launch.sh). Fixed.
    • A prior pass also flagged a prompt-injection surface (untrusted git data — paths/commit messages — reaching surfaced context / the autonomous agent). Hardened (safe_display() control-char stripping + untrusted-data framing in the headless prompt + a documented trust boundary in commands/wiki-code-watch.md) and CLEARED by the verifier. The unwatch/action_unregister flow was also traced and CLEARED.

CHANGELOG

  • Added an entry under ## [Unreleased] in CHANGELOG.md

plugin.json / marketplace.json are intentionally left at 1.9.2 and the changelog under ## [Unreleased] per CONTRIBUTING §8 — version assignment is yours at release time.

Screenshots / output

Real-repo smoke test (run against this plugin repo itself):

code-scan:      228 files, 45945 LOC, primary=markdown
code-signals:   git=True head=cb93ff6 anchors=275 edges=207
gitignore check: ignored leaked: NONE (clean)   # no .git/, node_modules, or __pycache__ indexed

Notes for reviewer

  • Scope decision: guided-ingest (deterministic signals → LLM synthesis), deliberately not AST/tree-sitter — smallest unit that works, language-agnostic, zero parser dependencies. Import edges are intentionally best-effort regex; ambiguity is surfaced with > [!gap] callouts rather than faked precision.
  • Autonomous drain is opt-in and gated behind ANTHROPIC_API_KEY; the default is in-session with a human in the loop (SessionStart surfaces drift and offers /wiki-code-ingest --sync).
  • Two hook systems are involved by necessity (documented in hooks/README.md): declarative Claude plugin hooks ship with the plugin; the commit-detecting git hooks are installed per-repo by /wiki-code-watch because plugins can't run install-time scripts and don't know which repo you'll map.
  • Code comments anticipate v1.10 in the project's version-tagged-comment style; rename if you'd assign a different version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant