Skip to content

How to properly use shared mypy cache with daemon mode  #16784

Open
@MakonnenMak

Description

@MakonnenMak

I intend to implement a modified version of the suggestion here to streamline the mypy type checking process, focusing on running the mypy server to perform type checks without requiring users to handle complex daemon restarts or cache management. The primary goal is to optimize the startup time for dmypy, particularly on CI (currently taking 10 minutes) and locally (taking 3 minutes), which exceeds the desired duration.

I tried using a zipped version of the cache on CI and it dropped it to 7 minutes which is better.
Not the biggest difference, but I feel it adds up when you have many engineers working on a code base over a week.

Workflow I'm considering (Applicable to both CI and local branches of the developing engineer):

  1. Check for Zipped Cache Directory:

    • If a zipped cache directory exists, unzip it to create the mypy cache directory.
    • If it doesn't exist, Create the cache directory using mypy --cache-fine-grained (basically cold start)
  2. Run dmypy with Fine-Grained Cache:

    • Execute dmypy --cache-fine-grained to initiate the mypy server with fine-grained caching.

Additional Step (Only on the Local Branch):

  1. Pushing to Remote Branch:
    • When a user pushes changes to the remote version of the branch, zip the cached directory and push it.

Things I'm worried about:

  • Cache Conflicts: Given this shared nature of the mypy cache, I'm concerned about possible conflicts. If multiple developers are merging their branches into the main branch concurrently, there could be scenarios where changes from different branches conflict within the mypy cache. I'm uncertain about how the mypy cache handles such conflicts.

  • Race Conditions: I'm worried about potential race conditions during cache updates. If developers are pushing changes simultaneously, there might be situations where the mypy cache is being updated by multiple processes concurrently. Understanding how the cache handles such race conditions is crucial to ensure consistency.

  • Main Branch Activity: Considering the dynamic nature of a main branch with continuous merges throughout the day, I'm unsure about how well the mypy cache will cope. The frequent changes to the codebase might pose challenges in maintaining an accurate and up-to-date cache for type checking.

  • Stacked PRs: Stacked pull requests could introduce complexities, and I'm uncertain about the impact on the mypy cache. The overlapping changes in stacked PRs might lead to unexpected behavior or difficulties in managing the cache effectively.

These considerations arise from a lack of confidence in understanding the internal mechanisms of the mypy cache.
I'm seeking further clarification on how the mypy cache handles concurrent updates, conflicts, and rapid changes in a collaborative development environment.

Should I even be using a shared cache in CI or run a fresh cycle of MyPy ?

A bunch of these concerns stem from this comment in the docs.

If you use the mypy daemon, you may want to restart the daemon each time after the merge base or local branch has changed to avoid processing a potentially large number of changes in an incremental build, as this can be much slower than downloading cache data and restarting the daemon.

I think as a starter I'd really just love some documentation on the basics of the cache, what to expect, and what to do/ not do.

I'm unsure how MyPy will handle things if the cache is stale for example .

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions