Skip to content

feat(tree): graphify tree — D3 v7 collapsible-tree HTML emitter#557

Open
dsremo wants to merge 2 commits intosafishamsi:v5from
dsremo:feat/tree-html-d3
Open

feat(tree): graphify tree — D3 v7 collapsible-tree HTML emitter#557
dsremo wants to merge 2 commits intosafishamsi:v5from
dsremo:feat/tree-html-d3

Conversation

@dsremo
Copy link
Copy Markdown

@dsremo dsremo commented Apr 26, 2026

Summary

Adds a new graphify tree subcommand that emits a self-contained D3 v7 collapsible-tree HTML view of an existing graph.json.

The existing graph.html (force-directed) is great for finding hubs and unexpected connections. For code review and onboarding, a hierarchical tree-of-modules view is faster: collapse everything and expand only the package you care about, the depth-based colour palette gives instant orientation, and the layout mirrors the on-disk structure.

UX choices include expand-all / collapse-all / reset-view buttons, multi-line wrapText labels with separately-coloured name and count, a depth-based palette, click-to-toggle subtrees, and a hover-inspector that surfaces the top-K outbound edges per symbol.

What's added

File Change
graphify/tree_html.py NEW — 575 LOC, no new runtime deps (D3 v7 from cdn.jsdelivr.net)
graphify/__main__.py tree subcommand + help text added after check-update
CHANGELOG.md Unreleased section entry

CLI

graphify tree                        # graphify-out/graph.json → graphify-out/GRAPH_TREE.html
graphify tree --graph foo/g.json --output foo/tree.html --label MyProject
graphify tree --max-children 500 --top-k-edges 24

Config flags

  • --graph PATH path to graph.json (default: graphify-out/graph.json)
  • --output HTML output path (default: graphify-out/GRAPH_TREE.html)
  • --root PATH filesystem root (default: longest common prefix of all source_files)
  • --max-children N cap visible children per node (default: 200)
  • --top-k-edges N per-symbol outbound edges in the inspector (default: 12)
  • --label NAME project label shown in the page header

Test plan

  • Smoke-test: python -m graphify tree --help prints expected usage
  • Imports cleanly (from graphify.tree_html import write_tree_html, DEFAULT_MAX_CHILDREN)
  • Tested locally against a 17 641-node graph — emits a 4.9 MB HTML file that renders smoothly in Firefox / Chromium
  • Reviewer check: works against the maintainer's own graphs (any size / language)
  • Reviewer check: HTML is fully self-contained beyond the D3 CDN load (no other external assets)

Notes for reviewer

  • No new dependencies. The HTML is self-contained beyond loading D3 v7 from the public CDN.
  • Hierarchy is built from source_file longest-common-prefix — same prefix-grouping the existing report uses, so the tree's structure matches what users already see in GRAPH_REPORT.md.
  • The inspector pre-computes the top-K outbound edges per symbol at emit time, so the page is fully interactive offline once loaded.
  • Branched from upstream v5 so it merges cleanly against the current default branch.

Adds a new `graphify tree` subcommand that emits a self-contained D3
v7 collapsible-tree HTML view of an existing graph.json.

Why
---
The existing `graph.html` (force-directed) is great for finding hubs
and unexpected connections.  But for code review and onboarding, a
hierarchical tree-of-modules view is much faster: you can collapse
everything and expand only the package you care about, the depth-
based colour palette gives instant orientation, and the layout
mirrors the on-disk structure.

UX choices include expand-all / collapse-all / reset-view buttons,
multi-line `wrapText` labels with separately-coloured name and count,
a depth-based palette, click-to-toggle subtrees, and a hover-inspector
that surfaces the top-K outbound edges per symbol.

Implementation
--------------
- `graphify/tree_html.py` (575 LOC, single file, no new runtime
  dependencies).  D3 v7 is loaded from cdn.jsdelivr.net at view time.
- Hierarchy is built from `source_file` longest-common-prefix;
  symbols are grouped by containing module so the tree mirrors the
  on-disk layout exactly.
- Inspector pre-computes top-K outbound edges per symbol so the page
  works fully offline once loaded.
- `__main__.py` adds the subcommand + help text after the
  `check-update` block.

Configuration
-------------
- `--graph PATH`     path to graph.json (default: graphify-out/graph.json)
- `--output HTML`    output path (default: graphify-out/GRAPH_TREE.html)
- `--root PATH`      filesystem root (default: LCP of source_files)
- `--max-children N` cap visible children per node (default: 200)
- `--top-k-edges N`  per-symbol outbound edges in inspector (default: 12)
- `--label NAME`     project label shown in the page header

Tested locally on a 17 641-node graph — emits a 4.9 MB HTML file
that renders smoothly in Firefox / Chromium.
@dsremo dsremo force-pushed the feat/tree-html-d3 branch from 891357f to c3ba79f Compare April 26, 2026 04:54
@Qodo-Free-For-OSS
Copy link
Copy Markdown

Hi, emit_html() injects title, header, and data_json directly into HTML and a <script> tag without escaping, allowing crafted graph labels/project_label to break out (e.g., via </script>) and execute script when the HTML is opened.

Severity: action required | Category: security

How to fix: Escape HTML and JS contexts

Agent prompt to fix - you can give this to your LLM of choice:

Issue description

graphify/tree_html.py emits unescaped values into HTML (<title>, <h1>) and into a <script> assignment (const initialJsonData = {data_json};). Malicious content in tree['name'] (from --label or graph node labels) can inject HTML/JS (notably via </script> sequences).

Issue Context

The repo already has a safe pattern in graphify/export.py (_js_safe() replacing </ with <\/ before embedding JSON in a script tag, plus html.escape() for HTML contexts).

Fix Focus Areas

  • graphify/tree_html.py[174-263]
  • graphify/tree_html.py[541-555]
  • (reference pattern) graphify/export.py[442-451]

Implementation notes

  • Use html.escape() for {title} and {header}.
  • Use a JS-safe JSON embedding helper similar to _js_safe() (json.dumps(...).replace("</", "<\\/")) for {data_json}.

We noticed a couple of other issues in this PR as well - happy to share if helpful.


Found by Qodo code review. FYI, Qodo is free for open-source.

html.escape() the values that land in <title> and <h1>, and replace </
with <\/ in the JSON embedded inside <script> so crafted graph labels
or --label values cannot break out. Mirrors the _js_safe() pattern in
export.py.

Reported by Qodo on PR safishamsi#557.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants