feat: add graphify dry-run command#157
feat: add graphify dry-run command#157nuthalapativarun wants to merge 3 commits intosafishamsi:v3from
Conversation
graphify dry-run [path] scans the corpus with detect() and prints a file-count table with corpus health warnings without writing any output files or building the graph.
|
Hey @safishamsi — just checking in on this one. Happy to rebase or make any adjustments if needed. Let me know! |
|
Hi, Severity: action required | Category: correctness How to fix: Make detect side-effect-free Agent prompt to fix - you can give this to your LLM of choice:
Found by Qodo code review |
detect() now accepts write_sidecars=False; when disabled, office files are counted directly without calling convert_office_file() or touching graphify-out/converted/. The dry-run CLI branch passes this flag so the no-write promise holds even for .docx/.xlsx corpora. Adds test_dry_run_office_no_sidecar_written to assert convert_office_file is never called during dry-run.
|
Good catch @qodo-ai-reviewer — fixed in 0b3e6eb.
Added |
|
Hi, In dry-run mode, .docx/.xlsx files are counted via count_words() without creating sidecars, but missing optional office libraries causes a silent 0-word count and no skipped/warning signal, so dry-run can report a “healthy” corpus that the real run would not process correctly. Severity: action required | Category: correctness How to fix: Warn/skip office without deps Agent prompt to fix - you can give this to your LLM of choice:
We noticed a couple of other issues in this PR as well - happy to share if helpful. Found by Qodo code review |
In write_sidecars=False mode, probe office files via docx_to_markdown/ xlsx_to_markdown (which return '' on ImportError). Empty result means the real run would also extract nothing — add to skipped list with an install hint instead of silently counting 0 words. __main__.py surfaces a dedicated 'Skipped (office deps missing)' line with pip install hint, and suppresses 'Corpus looks healthy' when office files were skipped. Adds test_dry_run_office_missing_deps_warns to assert the warning and install hint appear when docx_to_markdown is patched to return ''. Closes feedback from qodo-ai-reviewer on PR safishamsi#157.
|
Fixed in feb29c3. In the
Added |
Summary
Adds a
graphify dry-run [path]CLI command that scans the corpus and prints a file-count/health summary without writing any output files or building the graph.This is a safe preview step — useful for validating what graphify sees before committing to a full extraction run that may consume LLM tokens.
Usage
With a large corpus:
Implementation
graphify/__main__.py— newelif cmd == "dry-run"branch + help text entrydetect.detect()entirely — no new detection logicgraphify-out/is never created or touchedTest plan
test_dry_run_prints_summary— file-count table appears in outputtest_dry_run_no_files_written—graphify-out/is not createdtest_dry_run_default_path— defaults to current directory when path omittedtest_dry_run_missing_path— exits non-zero for a missing pathtest_dry_run_no_graphify_out_written— "No files were written" in output