[WIP] Rewrite backends in Rust using Ruff's parser use parquet for storage and faster indexing by tonybaloney · Pull Request #238 · tonybaloney/wily

tonybaloney · 2025-11-25T04:34:13Z

Replaces radon with a rust-based harvester backend. The harvesters use Ruff's AST, Lexer and Parser for better performance.

Removed a lot of old tooling and moved to more modern Python linters and formatters.

Added backward compatibility checks with radon to ensure metrics don't change between v1 and v2.

….0-alpha.1; enhance cognitive complexity metrics and improve documentation in HISTORY.md

… comprehensions for better readability

…lean up report test

…nagers

… files

…erl modules The manylinux2014 container was missing openssl-devel and perl-Time-Piece, causing openssl-sys to fail when building OpenSSL from source. 🐨 Generated with Crush Assisted-by: Claude Opus 4.6 via Crush <crush@charm.land>

Copilot

Pull request overview

This PR is a major WIP v2 rewrite that replaces the legacy Radon-based Python analysis pipeline and JSON cache with a Rust (PyO3) backend using Ruff’s parser and Parquet storage, and updates the CLI, operators, and tests accordingly.

Changes:

Introduces a Rust backend (backend/) to compute metrics (raw, cyclomatic, Halstead, maintainability, cognitive) and store/query results in metrics.parquet.
Refactors Python commands (build, rank, graph, index, list-metrics, etc.) to read/write via wily.backend.WilyIndex and Rich-based table output.
Reworks unit/integration tests to align with Parquet storage and “index only changed files per revision” behavior; removes many legacy unit tests tied to Radon/JSON/tabulate.

Show a summary per file

File	Description
test/unit/util.py	Removed legacy unit-test utilities for mocked v1 State/index.
test/unit/test_rank_unit.py	Removed v1 rank command unit tests (tabulate/State-based).
test/unit/test_operators.py	Adds coverage for new cognitive operator/metric resolution.
test/unit/test_list_metrics_unit.py	Removed v1 list-metrics unit tests (tabulate output assertions).
test/unit/test_index_unit.py	Removed v1 index command unit tests (tabulate/State-based).
test/unit/test_helper.py	Removed v1 helper tests for tabulate wrapping/style selection.
test/unit/test_graph_unit.py	Removed v1 graph unit tests that assumed per-revision complete history.
test/unit/test_cyclomatic.py	Removed Radon-harvester “bad data” regression tests.
test/unit/test_cache.py	Simplifies cache tests to new cache model (no JSON index/versioning/store).
test/unit/test_build_unit.py	Removed v1 build unit tests (multiprocessing/operator classes).
test/unit/test_archivers.py	Updates archiver test expectations after git archiver changes.
test/integration/test_state.py	Removed v1 integration tests around State/index.json cache.
test/integration/test_report.py	Updates report tests for new “log instead of stdout” behaviors and path normalization.
test/integration/test_rank.py	Updates rank CLI tests; removes threshold tests; adjusts invocation formatting.
test/integration/test_list_metrics.py	Makes list-metrics assertions more flexible and adds cognitive operator expectation.
test/integration/test_ipynb.py	Normalizes notebook path handling and loosens commit-count assertions.
test/integration/test_index.py	Adjusts index CLI assertions for new output/content behavior.
test/integration/test_graph.py	Normalizes graph path handling and invocation formatting.
test/integration/test_complex_commits.py	Updates to Parquet + “only changed files per revision” semantics; uses WilyIndex to validate.
test/integration/test_build.py	Updates build tests to assert Parquet output via WilyIndex and adds directory build coverage.
test/integration/test_archiver.py	Adds a comprehensive git revisions field test; updates end-to-end expectations.
test/integration/test_all_operators.py	Adds cognitive operator coverage and updates operator combinations.
test/conftest.py	Updates fixtures to build cache via new build flow; removes separate `index` invocation in fixture.
src/wily/state.py	Removes v1 State/Index/IndexedRevision cache model.
src/wily/operators/raw.py	Removes v1 Radon-based raw operator implementation.
src/wily/operators/maintainability.py	Removes v1 Radon-based maintainability operator implementation.
src/wily/operators/halstead.py	Removes v1 Radon-based Halstead operator implementation.
src/wily/operators/cyclomatic.py	Removes v1 Radon-based cyclomatic operator implementation.
src/wily/operators/init.py	Removes v1 operator registry + Metric aggregation definitions.
src/wily/operators.py	Introduces new operator/metric registry model and resolution helpers (incl. cognitive).
src/wily/helper/custom_enums.py	Minor formatting change.
src/wily/helper/init.py	Replaces tabulate helpers with Rich table rendering and adds box style support.
src/wily/defaults.py	Switches default table styling constant from tabulate to Rich box style.
src/wily/config/types.py	Modernizes typing (py310+ unions/collections.abc) and config fields.
src/wily/config/init.py	Adds cognitive to default operators and reformats config parsing.
src/wily/commands/rank.py	Refactors rank to read from Parquet via WilyIndex; removes total/threshold logic; uses Rich tables.
src/wily/commands/list_metrics.py	Refactors list-metrics to use OPERATOR_METRICS + Rich tables.
src/wily/commands/index.py	Refactors index to scan Parquet and print revision history via Rich tables.
src/wily/commands/graph.py	Refactors graph to read Parquet via WilyIndex and build traces without State/JSON cache.
src/wily/commands/build.py	Refactors build to use Rich progress and WilyIndex.analyze_revision (Parquet).
src/wily/cache.py	Simplifies cache handling; removes JSON index/versioning and JSON per-revision storage API.
src/wily/backend.pyi	Adds stubs for Rust extension module APIs (WilyIndex, git helpers, file iteration).
src/wily/archivers/git.py	Switches git archiver logic to Rust backend for revisions/checkout/find.
src/wily/archivers/filesystem.py	Updates filesystem archiver to RevisionInfo and modern typing.
src/wily/archivers/init.py	Introduces RevisionInfo TypedDict and updates BaseArchiver signatures/types.
src/wily/init.py	Switches logging to RichHandler and bumps version to 2.0.0a1.
README.md	Updates operator list documentation to include cognitive/halstead.
pyproject.toml	Switches build backend to maturin; updates dependencies, Python version floor, and tooling config.
Makefile	Updates build/install/lint targets for maturin + ruff.
HISTORY.md	Adds 2.0.0a1 (Unreleased) notes describing Rust backend + Parquet + cognitive complexity.
docs/source/commands/build.rst	Updates build docs to reflect new default operators (incl. cognitive/halstead).
backend/src/raw.rs	Adds Rust implementation of raw metrics using Ruff tokenization.
backend/src/lib.rs	Adds PyO3 module registration.
backend/src/halstead.rs	Adds Rust Halstead metric computation (Ruff AST).
backend/src/files.rs	Adds Rust iter_filenames implementation (WalkDir + glob).
backend/src/cyclomatic.rs	Adds Rust cyclomatic complexity computation (Ruff AST).
backend/Cargo.toml	Introduces Rust crate dependencies (PyO3, Ruff crates, arrow/parquet, git2, rayon).
backend/benches/analyze_revision.rs	Adds Criterion benchmarks for backend analysis performance.
AGENTS.md	Adds contributor/agent documentation describing v2 architecture, commands, conventions.
.gitignore	Updates ignores for Rust artifacts and test output directories.
.github/workflows/ci.yml	Updates CI to uv + Rust toolchain; adds clippy/rustfmt/wheel builds and trusted publishing.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 70/73 changed files
Comments generated: 6

Copilot · 2026-04-25T22:22:27Z

+def resolve_operators(operators: Iterable[Operator | str]) -> list[Operator]:
+    """
+    Resolve a list of operator names to their corresponding types.
+
+    Automatically includes 'raw' if 'maintainability' is requested, since
+    the maintainability index calculation depends on raw metrics.
+    """
+    resolved = [resolve_operator(operator) for operator in iter(operators)]
+    # Maintainability depends on raw metrics (LOC, SLOC, comments)
+    has_maintainability = any(op.name == "maintainability" for op in resolved)
+    has_raw = any(op.name == "raw" for op in resolved)
+    if has_maintainability and not has_raw:
+        resolved.insert(0, resolve_operator("raw"))
+    return resolved


resolve_operators() is annotated to accept Iterable[Operator | str], but it unconditionally calls resolve_operator(operator) which expects a string and will raise at runtime if an Operator instance is passed. Either restrict the parameter type to Iterable[str] or handle Operator inputs (e.g., pass through when already resolved) before calling resolve_operator().

Copilot · 2026-04-25T22:22:28Z

@@ -45,6 +43,7 @@ def rank(
    :param threshold: For total values beneath the threshold return a non-zero exit code.
    :param descending: Rank in descending order
    :param wrap: Wrap output
+    :param table_style: Table box style


The rank() docstring still documents a threshold parameter (and describes non-zero exit behavior), but the function signature no longer accepts threshold and the logic was removed. Please update the docstring to match the current parameters/behavior.

Copilot · 2026-04-25T22:22:28Z

+//! - h1: unique operands
+//! - h2: unique operators
+//! - N1: total operands
+//! - N2: total operators


The module-level docs list Halstead fields as h1=unique operands, h2=unique operators, N1=total operands, N2=total operators, but HalsteadMetrics implements h1 as distinct operators, h2 as distinct operands, N1 as total operators, N2 as total operands. Please correct the documentation to match the implementation (and Radon’s definitions).

Suggested change

//! - h1: unique operands

//! - h2: unique operators

//! - N1: total operands

//! - N2: total operators

//! - h1: unique operators

//! - h2: unique operands

//! - N1: total operators

//! - N2: total operands

Copilot · 2026-04-25T22:22:28Z

+            .filter(|row| row.path == path)
+            .cloned()
+            .collect();
+
+        // Sort by revision_date ascending (newest last)
+        matching_rows.sort_by_key(|a| a.revision_date);


WilyIndex.getitem’s doc comment says it matches rows where path equals or starts with the given prefix and that results are sorted by revision_date descending (newest first), but the implementation filters with row.path == path only and sorts by revision_date ascending. Either update the docs to reflect current behavior or adjust the implementation to provide the documented prefix-matching and ordering.

Suggested change

.filter(|row| row.path == path)

.cloned()

.collect();

// Sort by revision_date ascending (newest last)

matching_rows.sort_by_key(|a| a.revision_date);

.filter(|row| row.path == path || row.path.starts_with(&path))

.cloned()

.collect();

// Sort by revision_date descending (newest first)

matching_rows.sort_by(|a, b| b.revision_date.cmp(&a.revision_date));

Copilot · 2026-04-25T22:22:29Z

+def iter_filenames(targets: list[str], include_ipynb: bool = False) -> list[str]:
+    """Iterate over Python filenames in targets."""
+    ...
+
+def get_metrics_schema() -> list[tuple[str, str]]:
+    """Get the parquet schema as a list of (name, type) tuples."""
+    ...
+
+class WilyIndex:
+    """
+    Python context manager for efficient multi-revision parquet writes.
+
+    Usage:
+        with WilyIndex(output_path, operators) as index:
+            index.analyze_revision(paths, base_path, revision_key, ...)
+            index.analyze_revision(paths, base_path, revision_key, ...)
+        # File is written on exit
+
+    Querying:
+        with WilyIndex(output_path, operators) as index:
+            # Get all rows for a specific path
+            rows = index["src/foo.py"]
+
+            # Iterate over all rows
+            for row in index:
+                print(row)
+
+            # Get total row count
+            count = len(index)
+    """
+
+    def __init__(self, output_path: str, operators: list[str]) -> None: ...
+    def __enter__(self) -> WilyIndex: ...
+    def __getitem__(self, path: str) -> list[dict[str, Any]]:


The backend type stubs don’t match the Rust-exposed API: (1) iter_filenames is missing exclude/ignore parameters and its default include_ipynb differs from the Rust signature, and (2) WilyIndex.init requires operators but the Rust constructor accepts operators=None. Please update backend.pyi to match the actual PyO3 signatures so type-checking and IDE assistance reflect runtime behavior.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

tonybaloney · 2026-04-25T22:34:02Z

tonybaloney added 30 commits November 25, 2025 15:33

start rust conversion

aa9140e

Update CI

11617ab

simplify CI and reformat

3c38d8c

ignore artifacts

b53a933

further simplify

4127686

rust function returns tuple collection

211bc74

use rich progress bars

16666b0

fix variable name reuse

4a32e3b

Simplify rust module

e52db29

working prototype before crate denormalization

a671404

Use Ruff's API

79d2536

remove another radon dependency

3ed9616

cyclomatic harvestor

7a97eb7

remove use of builtin exit

9d742b5

halstead metrics

f7828e2

use stdlib mode function

942eae8

Implement Halstead harvesters

a0ec55a

update assertions

b6b0f1a

MI harvester

897dcb4

Create a new file iterator and remove radon

e964d1e

update lockfile

4a3d257

Format tests

de7e5a1

happier syntax with 3.10

9e92d1f

tidy up

c05834f

cleanup deps

cfc429d

better naming

6106e06

Use rust backend for processing

88afc23

ruff fixes

275a9d4

Diff uses new parallel function

9d30464

remove unused import

4022507

tonybaloney and others added 23 commits January 1, 2026 11:28

update benchmarks

ed3fa59

Use compact strings for operand hashmaps

4b363b9

Patch out diff for now

f94f40d

Use some constr

2c3c047

fix primitive sequence

f4e2f6c

cleanup old test

9c0b27f

only add rank when needed in diff

256782f

fix diff bug

91c3ba1

fully implement --no-detail flag

e34ac8e

allow diff against specific revision

3e82048

don't allow diff for non-indexed revisions

068e941

refactor diff

e59dd3a

Add cognitive complexity metric and update related components

8cc5573

Refactor CI workflow for release process and update versioning to 2.0…

6e519fa

….0-alpha.1; enhance cognitive complexity metrics and improve documentation in HISTORY.md

Clean up code by removing unnecessary blank lines and optimizing list…

5c00cbb

… comprehensions for better readability

Add before-script to install dependencies for Linux builds in CI

4e58a8e

Clean up crappy tests

c31a9a8

Update index tests to assert minimum occurrences of "An author" and c…

0c01fac

…lean up report test

Enhance before-script for Linux builds to support multiple package ma…

681af64

…nagers

Update CI before-script for Linux and add OpenSSL dependency in Cargo…

c452e16

… files

Drop macos x64 since it doesnt' exist anymore

6247bee

Update to new sort command

ff609cb

tonybaloney requested a review from Copilot April 25, 2026 22:17

Copilot started reviewing on behalf of tonybaloney April 25, 2026 22:17 View session

Copilot AI reviewed Apr 25, 2026

View reviewed changes

Apply suggestions from code review

89efaad

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

tonybaloney merged commit b2f1c06 into master Apr 25, 2026
49 checks passed

tonybaloney deleted the v2 branch April 25, 2026 22:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Rewrite backends in Rust using Ruff's parser use parquet for storage and faster indexing#238

[WIP] Rewrite backends in Rust using Ruff's parser use parquet for storage and faster indexing#238
tonybaloney merged 141 commits into
masterfrom
v2

tonybaloney commented Nov 25, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Uh oh!

tonybaloney commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

tonybaloney commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tonybaloney commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tonybaloney commented Nov 25, 2025 •

edited

Loading