Description
Overview
Currently if you import anything from a third-party package that has inline types, mypy will process all the transitive dependencies of the imported module as well. This can be pretty slow, since some packages have hundreds or thousands of dependencies (e.g. torch
, but there are probably many others).
We could speed this up by only processing those dependencies that are needed to type check code that uses the third-party package. This is possible, since we won't generally report errors from installed packages. This wouldn't be possible for normal code, since we could have false negatives in code that we don't process. We'd process (some) imported definitions lazily.
Example
Assume we have third-party package acme
that has 1000s of recursive module dependencies. Now we have user code that only uses one function from the top-level module:
from acme import do_stuff
do_stuff()
We might only need to process acme/__init__.py
to type check this code. Most of the 1000 dependencies can be ignored, and they don't even need to be parsed. However, if do_stuff
or other functions in acme/__init__.py
use a type in an annotation that is defined in a submodule of acme
, we might need to process modules that define those types as well, and any dependencies they might have. (This assumes module-level granularity of laziness. It's easy to also imagine definition-level laziness, so that only the do_stuff
function would have to be processed.)
Implementation sketch
Here's a sketch of a potential implementation:
- Add a flag for packages where we don't report errors in any recursive dependencies. This should be enabled for installed packages and stubs.
- Make sure installed packages and stubs can't import code where we do report errors. They should only import stubs and other installed packages. Otherwise there would be false negatives.
- When processing an import targeting a "recursive no-error" package, initially don't consider any module dependencies. Add import placeholders to symbol table for imported symbols (unless they are already available).
- If any code uses an import placeholder, defer the current node and keep track of the name of the placeholder target name. If any deferrals are due to import placeholders, process the import placeholders first before reprocessing the deferred nodes.
- An name that is available as a placeholder only would be "unresolved".
Discussion
Discussion:
- Imports that are only used within function bodies in installed packages don't need to be resolved. This could be a big win.
- Imports of other modules/functions/classes within installed package that are not used don't need to be resolved. This could help with packages that have massive public APIs.
- It would be easier if we can resolve all placeholders during semantic analysis. I'm not sure if this is possible, at least due to modules being able to implement protocols.
- If we analyze a class, we should maybe resolve all import placeholders in the class and any base classes to avoid having to resolve them during type checking.
- Also if we analyze a type annotation, we should maybe resolve all placeholders related to the type so that we can perform type checking without dealing with unresolved references.
- As an optimization, we could do a quick AST pass to determine any imported names that are definitely needed based on a shallow syntactic analysis (e.g. look for
from acme import submodule
). When processing a module, we'd resolve these first to avoid numerous deferrals. I'm not sure if this would be a big perf win or not. - Since big SCCs are common, we need to able to do this within SCCs, not just between SCCs.
- Circular dependencies need some care. We already support them, and probably the existing approach could be generalized.
Modules used as protocol-typed values could be an issue, since this could require arbitrary attributes (including nested imported modules) to be available, and we'd only know about this during type checking. So we might need to defer during type checking, and within various type operations such as subtype checks. This is probably pretty rare but still currently supported. Since this is expected to be rare, this doesn't need to be super efficient.