Skip to content

fix(wiki): clear stale articles before regenerating to prevent orphan accumulation#558

Open
szsip239 wants to merge 1 commit intosafishamsi:v5from
szsip239:fix/wiki-orphan-cleanup
Open

fix(wiki): clear stale articles before regenerating to prevent orphan accumulation#558
szsip239 wants to merge 1 commit intosafishamsi:v5from
szsip239:fix/wiki-orphan-cleanup

Conversation

@szsip239
Copy link
Copy Markdown

Problem

to_wiki() writes a fresh set of community + god-node articles each call but never deletes old files from previous runs. Since community labels are LLM-generated and non-deterministic across rebuilds (per skill.md Step 5), the same conceptual community is often named differently each time, leaving its previous file as an orphan. After N rebuilds, wiki/ contains roughly N times the active article count, with index.md only referencing the most recent run's labels.

Repro

graphify <path> --wiki              # generates ~100 articles + index
graphify <path> --update --wiki     # generates ~100 NEW articles (LLM picks different names)
ls graphify-out/wiki/*.md | wc -l   # ~200 files, index references ~100
# Repeat → unbounded growth

Real-world: a knowledge corpus accumulated 822 wiki .md files over 5 rebuilds, of which only 111 were referenced by index.md (710 orphans).

Fix

Clear *.md files in the output directory at the start of to_wiki(). This is consistent with its existing fully-regenerative behavior — it always writes the full set of articles + index, never partial updates. Subdirectories and non-.md files are preserved (only top-level .md is touched), so any user-placed auxiliary assets survive.

out.mkdir(parents=True, exist_ok=True)

# NEW: prevent orphan accumulation
for old_article in out.glob("*.md"):
    old_article.unlink()

Tests

Two new regression tests in tests/test_wiki.py:

  1. test_to_wiki_clears_stale_articles — calls to_wiki() twice with different community labels, asserts old files are gone and new files exist.
  2. test_to_wiki_preserves_non_md_files — places PNG/JSON/subdirectory content in the wiki dir, asserts they survive cleanup.

All 17 tests in test_wiki.py pass (15 existing + 2 new).

Compatibility

  • No CLI / API change
  • No behavioral change for first-time wiki generation (empty dir → no .md files to clear)
  • Custom .md files placed at top level of wiki/ (not a documented workflow) would be removed on next --wiki. Subdirectories and non-.md files are unaffected.

Related work

Existing wiki improvements addressed adjacent concerns but not orphan cleanup:

… accumulation

to_wiki() writes a fresh set of community + god-node articles each call but
never deletes old files from previous runs. Since community labels are
LLM-generated and non-deterministic across rebuilds (per skill.md Step 5),
the same conceptual community is often named differently each time, leaving
its previous file as an orphan. After N rebuilds, wiki/ contains roughly N
times the active article count, with index.md only referencing the most
recent run's labels.

Real-world: a knowledge corpus accumulated 822 wiki .md files over 5
rebuilds, of which only 111 were referenced by index.md (710 orphans).

Fix: clear *.md files in the output directory at the start of to_wiki().
This is consistent with its existing fully-regenerative behavior — it
always writes the full set of articles + index, never partial updates.
Subdirectories and non-.md files are preserved (only top-level .md is
touched), so any user-placed auxiliary assets survive.

Tests: two new regression tests cover (1) stale article cleanup across
runs with different labels, and (2) preservation of non-.md user files
and nested subdirectories.
rosschurchill added a commit to rosschurchill/graphify-super that referenced this pull request Apr 27, 2026
to_wiki() now globs and unlinks all *.md in the output dir before
writing fresh articles, preventing orphan accumulation across rebuilds.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant