Skip to content

[analyze] too many roots causing some confusion in large repos #24182

@markjm

Description

@markjm

Summary

Perhaps you can tell by this and #24058 - i am finding great value in this tool and think its very close to providing a very effective way to really indicate "what does my change implicate" based on a set of changes.

Problem

Currently, ruff scans the project for any & all directories with an __init__.py file (packages) and adds them to src_roots for searching when constructing the analyze graph

This causes some unexpected edges in the graph. In our codebase (14M LoC, 100k files, 12k __init__.py files (😢 ) ), this caused both some unexpected edges and some unexpected performance issues.

Example:

services/api/start.py

import graphql

... rest of file

Is creating an edge to internal/linter/for/graphql.py. Even though there is no way for them to "see" eachother at runtime. Note we may have things like this see eachother if they were, for example, using uv workspaces.

Proposed Solution

I have put up a PR (#24183) with my proposal. In short:

  1. src_roots collection: Instead of adding all package roots, collect src paths from discovered configs using resolver.settings(). This uses ruff's hierarchical config discovery - each file's settings come from its closest pyproject.toml, and only explicitly configured src paths are used for resolution.

  2. module_path computation: Fix the path calculation in lib.rs to use package.parent() as the src_root when computing the module path. This ensures relative imports resolve correctly (the package directory must be included in the module path, not used as the strip prefix).

With these 2 changes in our jumbo repo (not to say its a representative sample or types of projects, but good data)

  1. All the "bad edges" described above are gone from our graph
  2. As an added benefit, removing the thousands of extra src_roots drastically improves resolution time in large projects
  - Before: ~27 seconds
  - After:  ~3.3 seconds (~8x faster)

Version

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    analyzeRelated to Ruff analyze functionality

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions