Skip to content

[WIP] Rewrite backends in Rust using Ruff's parser use parquet for storage and faster indexing#238

Merged
tonybaloney merged 141 commits into
masterfrom
v2
Apr 25, 2026
Merged

[WIP] Rewrite backends in Rust using Ruff's parser use parquet for storage and faster indexing#238
tonybaloney merged 141 commits into
masterfrom
v2

Conversation

@tonybaloney
Copy link
Copy Markdown
Owner

@tonybaloney tonybaloney commented Nov 25, 2025

Replaces radon with a rust-based harvester backend. The harvesters use Ruff's AST, Lexer and Parser for better performance.

Removed a lot of old tooling and moved to more modern Python linters and formatters.

Added backward compatibility checks with radon to ensure metrics don't change between v1 and v2.

tonybaloney and others added 23 commits January 1, 2026 11:28
….0-alpha.1; enhance cognitive complexity metrics and improve documentation in HISTORY.md
…erl modules

The manylinux2014 container was missing openssl-devel and perl-Time-Piece,
causing openssl-sys to fail when building OpenSSL from source.

🐨 Generated with Crush

Assisted-by: Claude Opus 4.6 via Crush <crush@charm.land>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR is a major WIP v2 rewrite that replaces the legacy Radon-based Python analysis pipeline and JSON cache with a Rust (PyO3) backend using Ruff’s parser and Parquet storage, and updates the CLI, operators, and tests accordingly.

Changes:

  • Introduces a Rust backend (backend/) to compute metrics (raw, cyclomatic, Halstead, maintainability, cognitive) and store/query results in metrics.parquet.
  • Refactors Python commands (build, rank, graph, index, list-metrics, etc.) to read/write via wily.backend.WilyIndex and Rich-based table output.
  • Reworks unit/integration tests to align with Parquet storage and “index only changed files per revision” behavior; removes many legacy unit tests tied to Radon/JSON/tabulate.
Show a summary per file
File Description
test/unit/util.py Removed legacy unit-test utilities for mocked v1 State/index.
test/unit/test_rank_unit.py Removed v1 rank command unit tests (tabulate/State-based).
test/unit/test_operators.py Adds coverage for new cognitive operator/metric resolution.
test/unit/test_list_metrics_unit.py Removed v1 list-metrics unit tests (tabulate output assertions).
test/unit/test_index_unit.py Removed v1 index command unit tests (tabulate/State-based).
test/unit/test_helper.py Removed v1 helper tests for tabulate wrapping/style selection.
test/unit/test_graph_unit.py Removed v1 graph unit tests that assumed per-revision complete history.
test/unit/test_cyclomatic.py Removed Radon-harvester “bad data” regression tests.
test/unit/test_cache.py Simplifies cache tests to new cache model (no JSON index/versioning/store).
test/unit/test_build_unit.py Removed v1 build unit tests (multiprocessing/operator classes).
test/unit/test_archivers.py Updates archiver test expectations after git archiver changes.
test/integration/test_state.py Removed v1 integration tests around State/index.json cache.
test/integration/test_report.py Updates report tests for new “log instead of stdout” behaviors and path normalization.
test/integration/test_rank.py Updates rank CLI tests; removes threshold tests; adjusts invocation formatting.
test/integration/test_list_metrics.py Makes list-metrics assertions more flexible and adds cognitive operator expectation.
test/integration/test_ipynb.py Normalizes notebook path handling and loosens commit-count assertions.
test/integration/test_index.py Adjusts index CLI assertions for new output/content behavior.
test/integration/test_graph.py Normalizes graph path handling and invocation formatting.
test/integration/test_complex_commits.py Updates to Parquet + “only changed files per revision” semantics; uses WilyIndex to validate.
test/integration/test_build.py Updates build tests to assert Parquet output via WilyIndex and adds directory build coverage.
test/integration/test_archiver.py Adds a comprehensive git revisions field test; updates end-to-end expectations.
test/integration/test_all_operators.py Adds cognitive operator coverage and updates operator combinations.
test/conftest.py Updates fixtures to build cache via new build flow; removes separate index invocation in fixture.
src/wily/state.py Removes v1 State/Index/IndexedRevision cache model.
src/wily/operators/raw.py Removes v1 Radon-based raw operator implementation.
src/wily/operators/maintainability.py Removes v1 Radon-based maintainability operator implementation.
src/wily/operators/halstead.py Removes v1 Radon-based Halstead operator implementation.
src/wily/operators/cyclomatic.py Removes v1 Radon-based cyclomatic operator implementation.
src/wily/operators/init.py Removes v1 operator registry + Metric aggregation definitions.
src/wily/operators.py Introduces new operator/metric registry model and resolution helpers (incl. cognitive).
src/wily/helper/custom_enums.py Minor formatting change.
src/wily/helper/init.py Replaces tabulate helpers with Rich table rendering and adds box style support.
src/wily/defaults.py Switches default table styling constant from tabulate to Rich box style.
src/wily/config/types.py Modernizes typing (py310+ unions/collections.abc) and config fields.
src/wily/config/init.py Adds cognitive to default operators and reformats config parsing.
src/wily/commands/rank.py Refactors rank to read from Parquet via WilyIndex; removes total/threshold logic; uses Rich tables.
src/wily/commands/list_metrics.py Refactors list-metrics to use OPERATOR_METRICS + Rich tables.
src/wily/commands/index.py Refactors index to scan Parquet and print revision history via Rich tables.
src/wily/commands/graph.py Refactors graph to read Parquet via WilyIndex and build traces without State/JSON cache.
src/wily/commands/build.py Refactors build to use Rich progress and WilyIndex.analyze_revision (Parquet).
src/wily/cache.py Simplifies cache handling; removes JSON index/versioning and JSON per-revision storage API.
src/wily/backend.pyi Adds stubs for Rust extension module APIs (WilyIndex, git helpers, file iteration).
src/wily/archivers/git.py Switches git archiver logic to Rust backend for revisions/checkout/find.
src/wily/archivers/filesystem.py Updates filesystem archiver to RevisionInfo and modern typing.
src/wily/archivers/init.py Introduces RevisionInfo TypedDict and updates BaseArchiver signatures/types.
src/wily/init.py Switches logging to RichHandler and bumps version to 2.0.0a1.
README.md Updates operator list documentation to include cognitive/halstead.
pyproject.toml Switches build backend to maturin; updates dependencies, Python version floor, and tooling config.
Makefile Updates build/install/lint targets for maturin + ruff.
HISTORY.md Adds 2.0.0a1 (Unreleased) notes describing Rust backend + Parquet + cognitive complexity.
docs/source/commands/build.rst Updates build docs to reflect new default operators (incl. cognitive/halstead).
backend/src/raw.rs Adds Rust implementation of raw metrics using Ruff tokenization.
backend/src/lib.rs Adds PyO3 module registration.
backend/src/halstead.rs Adds Rust Halstead metric computation (Ruff AST).
backend/src/files.rs Adds Rust iter_filenames implementation (WalkDir + glob).
backend/src/cyclomatic.rs Adds Rust cyclomatic complexity computation (Ruff AST).
backend/Cargo.toml Introduces Rust crate dependencies (PyO3, Ruff crates, arrow/parquet, git2, rayon).
backend/benches/analyze_revision.rs Adds Criterion benchmarks for backend analysis performance.
AGENTS.md Adds contributor/agent documentation describing v2 architecture, commands, conventions.
.gitignore Updates ignores for Rust artifacts and test output directories.
.github/workflows/ci.yml Updates CI to uv + Rust toolchain; adds clippy/rustfmt/wheel builds and trusted publishing.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 70/73 changed files
  • Comments generated: 6

Comment thread src/wily/operators.py
Comment on lines +199 to +212
def resolve_operators(operators: Iterable[Operator | str]) -> list[Operator]:
"""
Resolve a list of operator names to their corresponding types.

Automatically includes 'raw' if 'maintainability' is requested, since
the maintainability index calculation depends on raw metrics.
"""
resolved = [resolve_operator(operator) for operator in iter(operators)]
# Maintainability depends on raw metrics (LOC, SLOC, comments)
has_maintainability = any(op.name == "maintainability" for op in resolved)
has_raw = any(op.name == "raw" for op in resolved)
if has_maintainability and not has_raw:
resolved.insert(0, resolve_operator("raw"))
return resolved
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve_operators() is annotated to accept Iterable[Operator | str], but it unconditionally calls resolve_operator(operator) which expects a string and will raise at runtime if an Operator instance is passed. Either restrict the parameter type to Iterable[str] or handle Operator inputs (e.g., pass through when already resolved) before calling resolve_operator().

Copilot uses AI. Check for mistakes.
Comment thread src/wily/commands/rank.py
Comment on lines 35 to +46
@@ -45,6 +43,7 @@ def rank(
:param threshold: For total values beneath the threshold return a non-zero exit code.
:param descending: Rank in descending order
:param wrap: Wrap output
:param table_style: Table box style
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rank() docstring still documents a threshold parameter (and describes non-zero exit behavior), but the function signature no longer accepts threshold and the logic was removed. Please update the docstring to match the current parameters/behavior.

Copilot uses AI. Check for mistakes.
Comment thread backend/src/halstead.rs
Comment on lines +4 to +7
//! - h1: unique operands
//! - h2: unique operators
//! - N1: total operands
//! - N2: total operators
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module-level docs list Halstead fields as h1=unique operands, h2=unique operators, N1=total operands, N2=total operators, but HalsteadMetrics implements h1 as distinct operators, h2 as distinct operands, N1 as total operators, N2 as total operands. Please correct the documentation to match the implementation (and Radon’s definitions).

Suggested change
//! - h1: unique operands
//! - h2: unique operators
//! - N1: total operands
//! - N2: total operators
//! - h1: unique operators
//! - h2: unique operands
//! - N1: total operators
//! - N2: total operands

Copilot uses AI. Check for mistakes.
Comment thread backend/src/storage.rs
Comment on lines +1109 to +1114
.filter(|row| row.path == path)
.cloned()
.collect();

// Sort by revision_date ascending (newest last)
matching_rows.sort_by_key(|a| a.revision_date);
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WilyIndex.getitem’s doc comment says it matches rows where path equals or starts with the given prefix and that results are sorted by revision_date descending (newest first), but the implementation filters with row.path == path only and sorts by revision_date ascending. Either update the docs to reflect current behavior or adjust the implementation to provide the documented prefix-matching and ordering.

Suggested change
.filter(|row| row.path == path)
.cloned()
.collect();
// Sort by revision_date ascending (newest last)
matching_rows.sort_by_key(|a| a.revision_date);
.filter(|row| row.path == path || row.path.starts_with(&path))
.cloned()
.collect();
// Sort by revision_date descending (newest first)
matching_rows.sort_by(|a, b| b.revision_date.cmp(&a.revision_date));

Copilot uses AI. Check for mistakes.
Comment thread src/wily/backend.pyi
Comment on lines +7 to +40
def iter_filenames(targets: list[str], include_ipynb: bool = False) -> list[str]:
"""Iterate over Python filenames in targets."""
...

def get_metrics_schema() -> list[tuple[str, str]]:
"""Get the parquet schema as a list of (name, type) tuples."""
...

class WilyIndex:
"""
Python context manager for efficient multi-revision parquet writes.

Usage:
with WilyIndex(output_path, operators) as index:
index.analyze_revision(paths, base_path, revision_key, ...)
index.analyze_revision(paths, base_path, revision_key, ...)
# File is written on exit

Querying:
with WilyIndex(output_path, operators) as index:
# Get all rows for a specific path
rows = index["src/foo.py"]

# Iterate over all rows
for row in index:
print(row)

# Get total row count
count = len(index)
"""

def __init__(self, output_path: str, operators: list[str]) -> None: ...
def __enter__(self) -> WilyIndex: ...
def __getitem__(self, path: str) -> list[dict[str, Any]]:
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backend type stubs don’t match the Rust-exposed API: (1) iter_filenames is missing exclude/ignore parameters and its default include_ipynb differs from the Rust signature, and (2) WilyIndex.init requires operators but the Rust constructor accepts operators=None. Please update backend.pyi to match the actual PyO3 signatures so type-checking and IDE assistance reflect runtime behavior.

Copilot uses AI. Check for mistakes.
Comment thread backend/src/raw.rs Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@tonybaloney
Copy link
Copy Markdown
Owner Author

:shipit:

@tonybaloney tonybaloney merged commit b2f1c06 into master Apr 25, 2026
49 checks passed
@tonybaloney tonybaloney deleted the v2 branch April 25, 2026 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants