[MAINTENANCE] dead code analyzer by joshua-stauffer · Pull Request #11760 · great-expectations/great_expectations

joshua-stauffer · 2026-04-01T10:35:01Z

Summary

Adds scripts/find_dead_code.py, an AST-based static analyzer that identifies modules, symbols, and test files not reachable from @public_api roots. It also adds scripts/dead_code_exceptions.json to configure known false-positive patterns, and an invoke dead-code task in tasks.py as a convenience wrapper.

Run it with:

invoke dead-code --verbose
python scripts/find_dead_code.py --json-output dead_code_report.json --layer all --verbose

How `find_dead_code.py` works

The script performs a multi-layer reachability analysis. The core idea: anything not reachable (by import chain or symbol reference) from a @public_api-decorated symbol is a dead code candidate.

Layer 1 — Module-level reachability (`ModuleGraphBuilder`)

This is the primary, high-confidence layer.

Discovery: Recursively finds every .py file under great_expectations/ and converts each to a dotted module name (e.g. great_expectations/profile/base.py → great_expectations.profile.base).
AST parsing: For each module, the file is parsed with Python's ast module. Imports inside if TYPE_CHECKING: blocks are intentionally excluded — those only exist for the type checker and don't create runtime dependencies.
Graph construction: Every import X and from X import Y statement (after resolving relative imports) becomes a directed edge: this module depends on X. from . import foo is treated speculatively as possibly importing a sibling submodule named foo.
Root identification: Root modules are any module containing at least one @public_api-decorated symbol, plus the top-level great_expectations package itself.
BFS reachability: A breadth-first search walks the dependency graph from all roots. When a module is visited, its ancestor __init__.py packages are also enqueued (Python loads all parent packages when importing a submodule). Any module not reached by this BFS is a dead module — reported with high confidence.

Layer 3 — Dynamic import detection (`DynamicImportDetector`)

Layer 3 runs before the BFS (despite the numbering) because its job is to augment the graph so the BFS is more accurate. It handles two GX-specific runtime patterns that pure import-statement analysis cannot see:

instantiate_class_from_config calls: Scans the AST of every module for calls to this function. When a config_defaults={"module_name": "great_expectations.some.module"} argument is found, it adds an edge from the calling module to the target. When the target is an unrecognized string prefix, it marks all matching submodules as reachable roots.
class_name string values in config dicts: Scans config-heavy modules (types, stores, abstract data context) for dict literals containing {"class_name": "SomeName"}. Looks up which module defines SomeName and adds an edge, preventing false-positive dead code reports for dynamically instantiated classes.

Layer 3 also loads dead_code_exceptions.json. Modules matching always_reachable_modules patterns are added as roots before the BFS. This handles cases like great_expectations.compatibility.*, which contains optional-dependency shims loaded conditionally at runtime in ways that can't be traced statically.

Layer 2 — Symbol-level reachability (`SymbolGraphBuilder`)

This layer operates within the already-reachable module set to find dead functions and classes (reported with medium confidence, since there are more edge cases).

Symbol graph: For each reachable module, each top-level class, function, and async function becomes a node. The edges are references: if function A calls function B (as a ast.Name that resolves through the import map), there is an edge A → B.
Module body node: A synthetic <module_body> node is added for each module, representing code that runs at import time (not inside any function or class). This node is always a root since it executes unconditionally. It adds edges to any symbols it references.
Root symbols: Symbols decorated with @public_api, names exported in __init__.py __all__ lists (resolving through re-exports), and all <module_body> nodes.
BFS reachability: Same BFS as Layer 1, but over the symbol graph. Symbols not reachable from any root are reported as dead. Private symbols (starting with _) and common noise names (logger, T, P, etc.) are suppressed.

Layer 4 — Test file analysis (`TestAnalyzer`)

Scans tests/ for test_*.py files (skipping conftest and __init__) that import dead production modules. Each file is classified as:

High confidence (all imports dead): every GX import in the file points to a dead module — the test file exists solely to test removed code.
Medium confidence (some imports dead): the file imports a mix of live and dead modules — individual test functions may need removal.

Output

The console summary prints module/symbol/test counts, dead module paths, dead symbol locations (filepath:line :: SymbolName), and test classifications. With --json-output, a machine-readable JSON report is written for downstream tooling (e.g. the dead-code-removal Claude skill reads this file to select removal batches).

`dead_code_exceptions.json`

A small config file for permanently excluding known false positives. always_reachable_modules accepts glob patterns matched against all discovered module names; matched modules are injected as roots before the BFS. ignore_patterns can suppress modules from the dead-module output even if they are unreachable.

`tasks.py`

Adds invoke dead-code as a thin wrapper around the script, supporting --json-output, --layer, and --verbose options.

netlify · 2026-04-01T10:35:07Z

✅ Deploy Preview for niobium-lead-7998 canceled.

Name	Link
🔨 Latest commit	`72ee912`
🔍 Latest deploy log	https://app.netlify.com/projects/niobium-lead-7998/deploys/69cf722fc38ddc00086e12b9

for more information, see https://pre-commit.ci

codecov · 2026-04-02T15:32:08Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.66%. Comparing base (65ee05f) to head (72ee912).
⚠️ Report is 80 commits behind head on develop.
✅ All tests successful. No failed tests found.

Additional details and impacted files

@@           Coverage Diff            @@
##           develop   #11760   +/-   ##
========================================
  Coverage    84.66%   84.66%           
========================================
  Files          471      471           
  Lines        39170    39170           
========================================
  Hits         33165    33165           
  Misses        6005     6005

Flag	Coverage Δ
3.10	`73.56% <ø> (ø)`
3.10 athena	`?`
3.10 aws_deps	`?`
3.10 big	`?`
3.10 clickhouse	`?`
3.10 filesystem	`?`
3.10 mysql	`?`
3.10 openpyxl or pyarrow or project or sqlite or aws_creds	`?`
3.10 postgresql	`?`
3.10 spark	`?`
3.10 spark_connect	`?`
3.10 sql_server	`?`
3.10 trino	`?`
3.11	`73.60% <ø> (ø)`
3.11 athena	`?`
3.11 aws_deps	`?`
3.11 big	`?`
3.11 clickhouse	`?`
3.11 filesystem	`?`
3.11 mysql	`?`
3.11 openpyxl or pyarrow or project or sqlite or aws_creds	`?`
3.11 postgresql	`?`
3.11 spark	`?`
3.11 spark_connect	`?`
3.11 sql_server	`?`
3.11 trino	`?`
3.12	`73.59% <ø> (-0.02%)`	⬇️
3.12 athena	`?`
3.12 aws_deps	`?`
3.12 big	`?`
3.12 filesystem	`?`
3.12 mysql	`?`
3.12 openpyxl or pyarrow or project or sqlite or aws_creds	`?`
3.12 postgresql	`?`
3.12 spark	`?`
3.12 spark_connect	`?`
3.12 sql_server	`?`
3.12 trino	`?`
3.13	`73.61% <ø> (+0.01%)`	⬆️
3.13 athena	`41.93% <ø> (ø)`
3.13 aws_deps	`45.18% <ø> (ø)`
3.13 big	`55.27% <ø> (ø)`
3.13 bigquery	`51.25% <ø> (ø)`
3.13 clickhouse	`41.94% <ø> (ø)`
3.13 databricks	`53.06% <ø> (ø)`
3.13 filesystem	`64.37% <ø> (ø)`
3.13 gx-redshift	`51.41% <ø> (ø)`
3.13 mysql	`51.81% <ø> (ø)`
3.13 openpyxl or pyarrow or project or sqlite or aws_creds	`59.97% <ø> (ø)`
3.13 postgresql	`55.22% <ø> (ø)`
3.13 snowflake	`53.90% <ø> (+<0.01%)`	⬆️
3.13 spark	`55.92% <ø> (ø)`
3.13 spark_connect	`46.85% <ø> (ø)`
3.13 sql_server	`53.23% <ø> (ø)`
3.13 trino	`48.75% <ø> (ø)`
cloud	`0.00% <ø> (ø)`
docs-basic	`59.52% <ø> (ø)`
docs-creds-needed	`58.11% <ø> (ø)`
docs-spark	`57.57% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

Adds a dead-code analysis utility that statically traces reachability from @public_api roots, with optional dynamic-import heuristics and test-file classification, and wires it into the repo’s Invoke tasks for easy execution.

Changes:

Added scripts/find_dead_code.py, an AST-based reachability analyzer (modules, symbols, dynamic-import heuristics, tests) with optional JSON output.
Added scripts/dead_code_exceptions.json to configure always-reachable module patterns / suppressions.
Added an invoke dead-code wrapper task in tasks.py.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
tasks.py	Adds an Invoke task that runs the dead-code analyzer script with CLI flags.
scripts/find_dead_code.py	Implements the multi-layer AST reachability analysis and report generation/printing.
scripts/dead_code_exceptions.json	Provides initial exception patterns for modules that should always be treated as reachable.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-02T17:41:47Z

+    repo_root = pathlib.Path(__file__).parent
+    _exit_with_error_if_not_run_from_correct_dir(task_name="dead-code", correct_dir=repo_root)
+
+    cmd = f"{sys.executable} scripts/find_dead_code.py --json-output {json_output} --layer {layer}"
+    if verbose:
+        cmd += " --verbose"
+    ctx.run(cmd, echo=True)


ctx.run() executes via a shell here, but json_output and layer are interpolated into the command without quoting. This breaks for paths with spaces and also allows shell injection if someone passes a crafted value. Build the command using proper argument quoting (e.g., shlex.quote) or avoid the shell by invoking the script via subprocess.run([...], check=True) from the task.

Copilot · 2026-04-02T17:41:47Z

+        module_body_fqn = f"{mod_name}.{self.MODULE_BODY}"
+        module_refs: set[str] = set()
+        for node in ast.iter_child_nodes(tree):
+            if id(node) in top_level_defs:


_process_module_body() skips all top-level FunctionDef/ClassDef nodes entirely, but decorators (and class bases/metaclass expressions) are evaluated at import time. This means symbol reachability can produce false positives for side-effectful decorators (e.g., great_expectations/_version.py uses @register_vcs_handler(...) to populate HANDLERS at import time). Consider including references found in decorator_list (and for classes, bases/keywords) in the module-body root traversal so import-time side effects are modeled.

Suggested change

if id(node) in top_level_defs:

if id(node) in top_level_defs:

# Top-level defs have bodies that are not executed at import time,

# but their decorators and (for classes) base/metaclass expressions

# are. Traverse those expressions to capture import-time references.

decorator_roots = []

if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef)):

decorator_roots.extend(node.decorator_list)

if isinstance(node, ast.ClassDef):

decorator_roots.extend(node.bases)

decorator_roots.extend(kw.value for kw in node.keywords)

for root in decorator_roots:

for child in ast.walk(root):

if isinstance(child, ast.Name):

name = child.id

if name in import_map:

module_refs.add(import_map[name])

elif name in info.defined_symbols:

module_refs.add(f"{mod_name}.{name}")

Copilot · 2026-04-02T17:41:48Z

+    "ignore_patterns": [],
+    "notes": {
+        "great_expectations.compatibility.*": "Compatibility shims for optional dependencies; loaded conditionally at runtime"
+    }


This config includes a notes mapping, but DynamicImportDetector.load_exceptions() never reads it (it only emits generic "Pattern matched" notes). Either remove the unused notes field to avoid confusion, or update the loader/reporting to surface these per-pattern explanations.

Suggested change

"ignore_patterns": [],

"notes": {

"great_expectations.compatibility.*": "Compatibility shims for optional dependencies; loaded conditionally at runtime"

}

"ignore_patterns": []

Copilot · 2026-04-02T17:41:48Z

+    name="dead-code",
+    help={
+        "json_output": "Path for JSON report output (default: dead_code_report.json)",
+        "layer": "Which layers to run: 1 (modules), 2 (symbols), 4 (tests), all (default: all)",


The task help text for layer omits layer 3, but scripts/find_dead_code.py accepts --layer 3 (dynamic-import augmentation). Update the help text (and optionally validate layer against the script’s choices) or remove the unsupported option so the wrapper’s CLI is self-consistent.

Suggested change

"layer": "Which layers to run: 1 (modules), 2 (symbols), 4 (tests), all (default: all)",

"layer": "Which layers to run: 1 (modules), 2 (symbols), 3 (dynamic-import augmentation), 4 (tests), all (default: all)",

github-actions · 2026-05-22T01:07:34Z

Is this PR still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity.

It will be closed if no further activity occurs. Thank you for your contributions 🙇

dead code analyzer

87f5808

[pre-commit.ci] auto fixes from pre-commit.com hooks

96765ec

for more information, see https://pre-commit.ci

joshua-stauffer changed the title ~~dead code analyzer~~ [MAINTENANCE] dead code analyzer Apr 2, 2026

Merge branch 'develop' into m/_/dead_code_analyzer

771c574

joshua-stauffer requested a review from tyler-hoffman April 2, 2026 15:50

Merge branch 'develop' into m/_/dead_code_analyzer

6f60523

Copilot AI review requested due to automatic review settings April 2, 2026 17:35

Copilot started reviewing on behalf of joshua-stauffer April 2, 2026 17:36 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

Merge branch 'develop' into m/_/dead_code_analyzer

72ee912

github-actions Bot added the stale Stale issues and PRs label May 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MAINTENANCE] dead code analyzer#11760

[MAINTENANCE] dead code analyzer#11760
joshua-stauffer wants to merge 5 commits into
developfrom
m/_/dead_code_analyzer

joshua-stauffer commented Apr 1, 2026 •

edited

Loading

Uh oh!

netlify Bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 2, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-            if id(node) in top_level_defs:
+            if id(node) in top_level_defs:
+                # Top-level defs have bodies that are not executed at import time,
+                # but their decorators and (for classes) base/metaclass expressions
+                # are. Traverse those expressions to capture import-time references.
+                decorator_roots = []
+                if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef)):
+                    decorator_roots.extend(node.decorator_list)
+                    if isinstance(node, ast.ClassDef):
+                        decorator_roots.extend(node.bases)
+                        decorator_roots.extend(kw.value for kw in node.keywords)
+                for root in decorator_roots:
+                    for child in ast.walk(root):
+                        if isinstance(child, ast.Name):
+                            name = child.id
+                            if name in import_map:
+                                module_refs.add(import_map[name])
+                            elif name in info.defined_symbols:
+                                module_refs.add(f"{mod_name}.{name}")

	"layer": "Which layers to run: 1 (modules), 2 (symbols), 4 (tests), all (default: all)",
	"layer": "Which layers to run: 1 (modules), 2 (symbols), 3 (dynamic-import augmentation), 4 (tests), all (default: all)",

Conversation

joshua-stauffer commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How find_dead_code.py works

Layer 1 — Module-level reachability (ModuleGraphBuilder)

Layer 3 — Dynamic import detection (DynamicImportDetector)

Layer 2 — Symbol-level reachability (SymbolGraphBuilder)

Layer 4 — Test file analysis (TestAnalyzer)

Output

dead_code_exceptions.json

tasks.py

Uh oh!

netlify Bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for niobium-lead-7998 canceled.

Uh oh!

codecov Bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

joshua-stauffer commented Apr 1, 2026 •

edited

Loading

How `find_dead_code.py` works

Layer 1 — Module-level reachability (`ModuleGraphBuilder`)

Layer 3 — Dynamic import detection (`DynamicImportDetector`)

Layer 2 — Symbol-level reachability (`SymbolGraphBuilder`)

Layer 4 — Test file analysis (`TestAnalyzer`)

`dead_code_exceptions.json`

`tasks.py`

netlify Bot commented Apr 1, 2026 •

edited

Loading

codecov Bot commented Apr 2, 2026 •

edited

Loading