feat: add graphify dry-run command by nuthalapativarun · Pull Request #157 · safishamsi/graphify

nuthalapativarun · 2026-04-09T17:23:55Z

Summary

Adds a graphify dry-run [path] CLI command that scans the corpus and prints a file-count/health summary without writing any output files or building the graph.

This is a safe preview step — useful for validating what graphify sees before committing to a full extraction run that may consume LLM tokens.

Usage

$ graphify dry-run ./my-project
Corpus scan: /abs/path/my-project

  Code files          23
  Documents            7
  Total               30  (~84,200 words)

Corpus looks healthy — no warnings.

No files were written. Run without dry-run to build the graph.

With a large corpus:

warning: Large corpus: 312 files · ~620,000 words. Semantic extraction
will be expensive (many Claude tokens). Consider running on a subfolder,
or use --no-semantic to run AST-only.

Implementation

graphify/__main__.py — new elif cmd == "dry-run" branch + help text entry
Reuses detect.detect() entirely — no new detection logic
graphify-out/ is never created or touched

Test plan

test_dry_run_prints_summary — file-count table appears in output
test_dry_run_no_files_written — graphify-out/ is not created
test_dry_run_default_path — defaults to current directory when path omitted
test_dry_run_missing_path — exits non-zero for a missing path
test_dry_run_no_graphify_out_written — "No files were written" in output

graphify dry-run [path] scans the corpus with detect() and prints a file-count table with corpus health warnings without writing any output files or building the graph.

nuthalapativarun · 2026-04-25T18:15:37Z

Hey @safishamsi — just checking in on this one. Happy to rebase or make any adjustments if needed. Let me know!

Qodo-Free-For-OSS · 2026-04-26T07:26:06Z

Hi, graphify dry-run calls graphify.detect.detect(), but detect() can create graphify-out/converted/*.md sidecar files when it encounters .docx/.xlsx files. This violates the dry-run promise and can cause unexpected filesystem writes during what is advertised as a no-write preview step.

Severity: action required | Category: correctness

How to fix: Make detect side-effect-free

Agent prompt to fix - you can give this to your LLM of choice:

Issue description

graphify dry-run must not write any files, but it currently calls graphify.detect.detect() which may write office conversion sidecars into graphify-out/converted/.

Issue Context

graphify/__main__.py dry-run branch calls _detect(root) and prints “No files were written”.

graphify/detect.py converts .docx/.xlsx by writing markdown sidecars.

Fix Focus Areas

Add a dry_run/write_sidecars/convert_office boolean parameter to graphify.detect.detect() (default preserving current behavior).

Ensure that when the flag is disabled, detect() does not create directories or write any files (skip conversion, and optionally count words directly from the office file or as 0).

Call detect(..., write_sidecars=False) (or equivalent) from the dry-run CLI branch.

References

graphify/main.py[794-823]

graphify/detect.py[347-376]

graphify/detect.py[187-213]

Found by Qodo code review

detect() now accepts write_sidecars=False; when disabled, office files are counted directly without calling convert_office_file() or touching graphify-out/converted/. The dry-run CLI branch passes this flag so the no-write promise holds even for .docx/.xlsx corpora. Adds test_dry_run_office_no_sidecar_written to assert convert_office_file is never called during dry-run.

nuthalapativarun · 2026-04-26T17:07:59Z

Good catch @qodo-ai-reviewer — fixed in 0b3e6eb.

detect() now accepts a write_sidecars=False keyword argument. When disabled, office files (.docx/.xlsx) are counted directly without calling convert_office_file() or touching graphify-out/converted/. The dry-run CLI branch passes this flag, so the no-write promise holds even for corpora containing office files.

Added test_dry_run_office_no_sidecar_written which mocks convert_office_file and asserts it is never called during a dry-run invocation.

Qodo-Free-For-OSS · 2026-04-28T08:10:13Z

Hi, In dry-run mode, .docx/.xlsx files are counted via count_words() without creating sidecars, but missing optional office libraries causes a silent 0-word count and no skipped/warning signal, so dry-run can report a “healthy” corpus that the real run would not process correctly.

Severity: action required | Category: correctness

How to fix: Warn/skip office without deps

Agent prompt to fix - you can give this to your LLM of choice:

Issue description

graphify dry-run calls detect(..., write_sidecars=False). For office files (.docx/.xlsx), this path currently counts words directly via count_words(p).

If optional office dependencies aren’t installed, docx_to_markdown() / xlsx_to_markdown() return an empty string, so count_words() returns 0 and dry-run prints “Corpus looks healthy — no warnings.” This hides that office content won’t actually be extracted/usable in non-dry-run runs.

Issue Context

The write_sidecars=True code path already treats office conversion failures as a “skipped” condition with an install hint. Dry-run should surface the same problem (or at least warn) rather than silently counting 0 words.

Fix Focus Areas

graphify/detect.py[302-410]

In the write_sidecars=False office branch, detect the “no office support” case (e.g., if conversion output is empty) and add an entry to skipped_sensitive (or a dedicated skipped_office) plus set a warning.

Consider not counting office files as successfully scanned if their text extraction failed.

graphify/main.py[794-823]

Optionally adjust dry-run messaging to explicitly mention office files skipped due to missing extras.

tests/test_dry_run.py[65-75]

Add a test that simulates missing python-docx / openpyxl (e.g., patch docx_to_markdown to return "") and asserts a warning/skipped message is surfaced.

We noticed a couple of other issues in this PR as well - happy to share if helpful.

Found by Qodo code review

In write_sidecars=False mode, probe office files via docx_to_markdown/ xlsx_to_markdown (which return '' on ImportError). Empty result means the real run would also extract nothing — add to skipped list with an install hint instead of silently counting 0 words. __main__.py surfaces a dedicated 'Skipped (office deps missing)' line with pip install hint, and suppresses 'Corpus looks healthy' when office files were skipped. Adds test_dry_run_office_missing_deps_warns to assert the warning and install hint appear when docx_to_markdown is patched to return ''. Closes feedback from qodo-ai-reviewer on PR safishamsi#157.

nuthalapativarun · 2026-04-28T15:36:58Z

Fixed in feb29c3.

In the write_sidecars=False branch, detect() now probes each office file by calling docx_to_markdown/xlsx_to_markdown in-memory (no writes). Both functions already return "" on ImportError, so an empty result means the real run would also extract nothing. Those files are added to skipped_sensitive with a pip install graphify[office] hint instead of being silently counted as 0 words.

__main__.py splits skipped entries into a dedicated "Skipped (office deps missing)" line with the install hint, and suppresses "Corpus looks healthy" when office files were skipped — so dry-run now accurately reflects what a real run would produce.

Added test_dry_run_office_missing_deps_warns which patches docx_to_markdown to return "" and asserts the warning and install hint appear in output.

feat: add dry-run CLI command

0007278

graphify dry-run [path] scans the corpus with detect() and prints a file-count table with corpus health warnings without writing any output files or building the graph.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add graphify dry-run command#157

feat: add graphify dry-run command#157
nuthalapativarun wants to merge 3 commits intosafishamsi:v3from
nuthalapativarun:feat/dry-run-command

nuthalapativarun commented Apr 9, 2026 •

edited

Loading

Uh oh!

nuthalapativarun commented Apr 25, 2026

Uh oh!

Qodo-Free-For-OSS commented Apr 26, 2026

Issue description

Issue Context

Fix Focus Areas

References

Uh oh!

nuthalapativarun commented Apr 26, 2026

Uh oh!

Qodo-Free-For-OSS commented Apr 28, 2026

Issue description

Issue Context

Fix Focus Areas

Uh oh!

nuthalapativarun commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

nuthalapativarun commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Usage

Implementation

Test plan

Uh oh!

nuthalapativarun commented Apr 25, 2026

Uh oh!

Qodo-Free-For-OSS commented Apr 26, 2026

Issue description

Issue Context

Fix Focus Areas

References

Uh oh!

nuthalapativarun commented Apr 26, 2026

Uh oh!

Qodo-Free-For-OSS commented Apr 28, 2026

Issue description

Issue Context

Fix Focus Areas

Uh oh!

nuthalapativarun commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nuthalapativarun commented Apr 9, 2026 •

edited

Loading