|
| 1 | +# Using MyPy with rules_python |
| 2 | + |
| 3 | +This is a walkthrough of using [rules_mypy](https://github.com/theoremlp/rules_mypy) together with `rules_python` to apply typechecks as part of "building" a Python application. |
| 4 | + |
| 5 | +## How MyPy will work |
| 6 | + |
| 7 | +Bazel's [aspects](https://bazel.build/extending/aspects) allow extensions to traverse the build graph and apply rewrites to it between the analysis pass and before `build` happens. |
| 8 | +A common application of aspects is to "bolt on" behavior to existing rules without having to modify them. |
| 9 | +Which is exactly what we're going to do here. |
| 10 | + |
| 11 | +One way that an aspect can extend an existing rule is by adding an [`OutputGroupInfo`](https://bazel.build/versions/7.4.0/rules/lib/providers/OutputGroupInfo) provider to the rule. |
| 12 | +Output groups are a slightly unusual feature which allows for rule names to be overloaded, and for a rule to provide multiple kinds of outputs. |
| 13 | +To take a slightly familiar example, the `py_binary` rule normally outputs a launcher script and a `.runfiles` tree, but it also provides a zipapp output which can be selectively enabled. |
| 14 | +Outputs may be selected during a build using the [`--output_groups`](https://bazel.build/reference/command-line-reference#flag--output_groups) flag, or by specifying the `output_group` attribute on a `filegroup` rule consuming targets. |
| 15 | + |
| 16 | +Notionally how this will all work is that: |
| 17 | + |
| 18 | +- We need to create an aspect configured to use whatever `mypy` tool we may want. |
| 19 | +- That aspect will extend the `py_*` rules in the build graph to add an output group capturing the MyPy typecheck cache. |
| 20 | + These typecheck cache outputs will depend on the typecheck cache outputs of all dependencies. |
| 21 | + |
| 22 | +This all has the effect of creating a build sub-graph parallel to our normal build graph which instead of producing and consuming Python files as dependencies produces and consumes the MyPy analysis caches. |
| 23 | + |
| 24 | +To take a simple example, let's say that we have a small build graph |
| 25 | + |
| 26 | +```mermaid |
| 27 | +graph TD |
| 28 | + A[data_models] --> B[data_persistence]; |
| 29 | + A --> C[inventory_management]; |
| 30 | + A --> D[order_processing]; |
| 31 | + B --> C; |
| 32 | + B --> D; |
| 33 | + C --> E[cli]; |
| 34 | + D --> E; |
| 35 | + F["click (3rdparty)"] --> E; |
| 36 | + G["pydantic (3rdparty)"] --> A; |
| 37 | +``` |
| 38 | + |
| 39 | +Ordinarily the dependencies between these rules take the form of the Python source files underlying the rules. |
| 40 | +But when we activate our `mypy` aspect and select the `mypy` output group, the cache files from typechecking each* of these targets also become part of that dependency chain. |
| 41 | +This allows Bazel to drive typechecking these libraries in depgraph order while caching intermediate results. |
| 42 | + |
| 43 | +We can demonstrate this by looking at the results of `bazel aquery`, which will show MyPy invocations and that the resulting cache trees are dependencies between each of the invocations. |
| 44 | +But more on that in a minute. |
| 45 | + |
| 46 | +## Setup |
| 47 | + |
| 48 | +In this example we've set up `rules_python` in combination with `rules_uv`, which provides lockfile compilation. |
| 49 | +These two give us a Python dependency solution (including the MyPy we want to use), which we'll feed into `rules_mypy`. |
| 50 | + |
| 51 | +The main trick is in `//tools/mypy:BUILD.bazel`, where we provide a definition of the MyPy CLI binary which we can feed into the checking aspect. |
| 52 | +This is important because it allows us to use our locked requirement for MyPy, and to provide MyPy plugins. |
| 53 | +If we didn't do this, `rules_mypy` would "helpfully" provide an embedded default version and configuration of MyPy which may or may not be what we want. |
| 54 | + |
| 55 | +We've configured our `.bazelrc` to apply the aspect so that users don't have to think about separately enabling it. |
| 56 | +Since there's other Python code in this monorepo which doesn't typecheck and we don't want to have to address that to adopt typing, we're going to use the `opt_in_tags` parameter on the aspect configuration. |
| 57 | +This allows us to specify `tags=["mypy"]` on relevant Python targets to selectively apply typechecking rather than just getting mypy checks applied to everything. |
| 58 | +We could also use the `opt_out_tags` parameter on the aspect and annotate stuff we don't want to typecheck, but that has more impact for initial adoption. |
| 59 | + |
| 60 | +Otherwise users explicitly have to list `--aspects=...` when they're interested in leveraging typechecks. |
| 61 | + |
| 62 | +For the same reason we've also configured our `.bazelrc` to enable the `mypy` output group by default. |
| 63 | +This may or may not be desired behavior, since enabling the `mypy` output makes passing typechecks a blocker for build and test operations. |
| 64 | + |
| 65 | +## Demo |
| 66 | + |
| 67 | +If we use `bazel aquery //py_mypy/cli`, we will see among much other output |
| 68 | + |
| 69 | +``` |
| 70 | +action 'mypy //py_mypy/cli:cli' |
| 71 | + Mnemonic: mypy |
| 72 | + Target: //py_mypy/cli:cli |
| 73 | + Configuration: darwin_arm64-fastbuild |
| 74 | + Execution platform: @@platforms//host:host |
| 75 | + AspectDescriptors: [ |
| 76 | + //tools/mypy:defs.bzl%mypy_aspect(cache='true', color='true') |
| 77 | +] |
| 78 | + ActionKey: ... |
| 79 | + Inputs: [ |
| 80 | + bazel-out/.../bin/py_mypy/inventory_management/inventory_management.mypy_cache, |
| 81 | + bazel-out/.../bin/py_mypy/order_processing/order_processing.mypy_cache, |
| 82 | + bazel-out/.../bin/tools/mypy/mypy, |
| 83 | + ... |
| 84 | + ] |
| 85 | +``` |
| 86 | + |
| 87 | +This is the actual typecheck action of the `//py_mypy/cli:cli` target, showing that as inputs it takes (among many other things) the `.mypy_cache` tree results from typechecking the two sub-libraries `inventory_management` and `order_processing`. |
| 88 | + |
| 89 | +If we dig around in the action plan a bit more, we'll also find the typecheck definitions for those products. |
| 90 | +For instance if we inspect the `inventory_management` build, we'll find the production action for those cache files. |
| 91 | + |
| 92 | +``` |
| 93 | +action 'mypy //py_mypy/inventory_management:inventory_management' |
| 94 | + Mnemonic: mypy |
| 95 | + Target: //py_mypy/inventory_management:inventory_management |
| 96 | + Configuration: darwin_arm64-fastbuild |
| 97 | + Execution platform: @@platforms//host:host |
| 98 | + AspectDescriptors: [ |
| 99 | + //tools/mypy:defs.bzl%mypy_aspect(cache='true', color='true') |
| 100 | + ] |
| 101 | + ActionKey: ... |
| 102 | + Inputs: [ |
| 103 | + bazel-out/.../bin/py_mypy/data_models/data_models.mypy_cache, |
| 104 | + bazel-out/.../bin/tools/mypy/mypy, |
| 105 | + ... |
| 106 | + ] |
| 107 | +``` |
| 108 | + |
| 109 | +This demonstrates that the `rules_mypy` configuration will perform incremental typechecking (only targets which changed will be re-checked except in the case of a cascading failure), to the limit of 1stparty code. |
| 110 | + |
| 111 | +Per [rules_mypy#23](https://github.com/theoremlp/rules_mypy/issues/23), the aspect which creates typecheck rules short-circuits and stops to create annotations when it encounters 3rdparty code. |
| 112 | +This bypasses the problem of attempting to apply 1stparty typecheck rules to code which may not conform to them, but creates the problem that because there is no shared `pyspark.mypy_cache` output, 3rdparty libraries may be typechecked (or at least analyzed as part of typechecking) more than once. |
0 commit comments