⚡ perf(cli): faster CLI startup via lazy imports#3535
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors Anomalib’s CLI and public package __init__.py re-exports to be lazily imported, significantly reducing startup time for lightweight CLI calls (notably anomalib --help) by avoiding eager imports of heavy dependencies (Torch/Lightning, models, datasets, loggers, engine) unless a subcommand actually needs them.
Changes:
- Implement lazy-loading via module-level
__getattr__inanomalib.models,anomalib.data,anomalib.engine, andanomalib.loggers. - Speed up CLI initialization by deferring heavy imports and by only fully building the selected subcommand’s parser arguments.
- Make CLI help/docstring behavior lazy by deferring
Enginemethod references used for help text.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
src/anomalib/cli/cli.py |
Adds subcommand “sniffing” + moves heavy imports into subcommand-specific paths to reduce CLI startup overhead. |
src/anomalib/cli/pipelines.py |
Lazily initializes the pipeline registry and avoids importing pipelines during basic CLI/help flows. |
src/anomalib/cli/utils/help_formatter.py |
Lazily builds the Engine-method mapping used for docstring-derived usage/help panels. |
src/anomalib/loggers/__init__.py |
Switches optional logger integrations to lazy attribute loading. |
src/anomalib/models/__init__.py |
Replaces eager model imports with a class→module map and lazy __getattr__ loading. |
src/anomalib/data/__init__.py |
Keeps lightweight dataclasses eager but makes datasets/datamodules/enums lazy via __getattr__. |
src/anomalib/engine/__init__.py |
Lazily re-exports Engine, XPUAccelerator, and SingleXPUStrategy. |
Comments suppressed due to low confidence (1)
src/anomalib/data/init.py:144
get_datamoduleno longer has theDictConfig | ListConfig | dicttype annotation even though the docstring still claims it supportsListConfig. With the current implementation, passing aListConfigwill fail atconfig_.class_pathbecause aListConfigis neither converted nor handled specially. Either add explicitListConfighandling (and keep the typed signature viaTYPE_CHECKINGimports) or removeListConfigfrom the documented/typed contract.
def get_datamodule(config) -> AnomalibDataModule:
"""Get Anomaly Datamodule from config.
Args:
config: Configuration for the anomaly model. Can be either:
| for token in tokens: | ||
| if not token.startswith("-"): | ||
| return token |
There was a problem hiding this comment.
_sniff_subcommand treats the value passed to global options (e.g. -c/--config) as the subcommand because it returns the first token that doesn't start with -. This breaks valid invocations like anomalib --config config.yaml train/anomalib -c config.yaml train (it will think config.yaml is the subcommand and skip building the real subcommand parser, likely causing config-file parsing/validation failures). Update the sniffing logic to skip option values for known global flags (at least -c/--config and --config=...), or fall back to building the full parser when a config file is provided.
| for token in tokens: | |
| if not token.startswith("-"): | |
| return token | |
| index = 0 | |
| while index < len(tokens): | |
| token = tokens[index] | |
| if token in {"-c", "--config"}: | |
| index += 2 | |
| continue | |
| if token.startswith("--config="): | |
| index += 1 | |
| continue | |
| if not token.startswith("-"): | |
| return token | |
| index += 1 |
|
|
||
| from anomalib import __version__ | ||
| from anomalib.cli.pipelines import PIPELINE_REGISTRY, pipeline_subcommands, run_pipeline | ||
| from anomalib.cli.utils.help_formatter import CustomHelpFormatter, get_short_docstring |
There was a problem hiding this comment.
get_short_docstring is imported but never used in this module. Please remove the unused import to avoid unnecessary coupling and keep the lazy-import goal consistent.
| from anomalib.cli.utils.help_formatter import CustomHelpFormatter, get_short_docstring | |
| from anomalib.cli.utils.help_formatter import CustomHelpFormatter |
| __all__ = ["configure_logger"] | ||
|
|
||
| try: | ||
| from .comet import AnomalibCometLogger # noqa: F401 | ||
| from .mlflow import AnomalibMLFlowLogger # noqa: F401 | ||
| from .tensorboard import AnomalibTensorBoardLogger # noqa: F401 | ||
| from .wandb import AnomalibWandbLogger # noqa: F401 | ||
|
|
||
| __all__.extend( | ||
| [ | ||
| "AnomalibCometLogger", | ||
| "AnomalibTensorBoardLogger", | ||
| "AnomalibWandbLogger", | ||
| "AnomalibMLFlowLogger", | ||
| ], | ||
| ) | ||
| except ImportError: | ||
| print("To use any logger install it using `anomalib install -v`") | ||
| _LOGGER_NAMES = { |
There was a problem hiding this comment.
The module docstring advertises importing Anomalib*Logger classes from anomalib.loggers, but __all__ only exposes configure_logger. This changes star-import/introspection behavior compared to the previous eager-import implementation and is inconsistent with other lazy __init__.py modules here (e.g., anomalib.models, anomalib.data, anomalib.engine). Consider adding the lazy-exported logger class names to __all__ (even if they remain lazily loaded via __getattr__).
| import importlib | ||
|
|
||
| module = importlib.import_module(_LOGGER_NAMES[name], __name__) | ||
| return getattr(module, name) |
There was a problem hiding this comment.
__getattr__ returns the requested logger class but does not cache it in globals(). This causes repeated attribute access (or repeated from anomalib.loggers import ... in different modules) to re-import the same submodule. Consider caching the resolved object in globals()[name] (as done in anomalib.engine.__getattr__ and anomalib.models.__getattr__).
| return getattr(module, name) | |
| logger_class = getattr(module, name) | |
| globals()[name] = logger_class | |
| return logger_class |
| """ | ||
| from anomalib.utils.path import convert_to_snake_case, convert_to_title_case | ||
|
|
||
| _import_all_models() | ||
|
|
There was a problem hiding this comment.
list_models docstring describes supported case values as snake_case/original with default snake_case, but the implementation accepts only {snake, pascal, title} with default snake. Please update the docstring so the documented API matches the runtime behavior.
| def get_model(model, *args, **kwdargs) -> AnomalibModule: | ||
| """Get an anomaly detection model instance. | ||
|
|
||
| This function instantiates an anomaly detection model based on the provided |
There was a problem hiding this comment.
get_model lost its input type annotation (was DictConfig | str | dict | Namespace). Since this is a public API, dropping the signature typing reduces IDE/type-checker usefulness. Consider keeping the annotation using from __future__ import annotations and/or TYPE_CHECKING imports so you can stay lazy at runtime while preserving type information.
| _PIPELINE_REGISTRY: dict[str, type[Pipeline]] | None | str = "uninitialized" | ||
|
|
||
| _PIPELINE_DESCRIPTIONS: dict[str, str] = { |
There was a problem hiding this comment.
Using the string literal sentinel 'uninitialized' for _PIPELINE_REGISTRY forces the variable type to include str and is easy to accidentally collide with. A dedicated sentinel object (e.g., a private object() instance) would avoid widening the type and make the state machine clearer.
Replace eager imports of Comet, MLflow, TensorBoard, and WandB loggers with __getattr__-based lazy loading. These backends pull in heavy optional dependencies that are unnecessary for basic CLI operations.
Phase 1: Replace all top-level heavy imports (torch, lightning, Engine, AnomalibModule, AnomalibDataModule, pipeline registry) with deferred imports inside the functions that need them. Phase 2: Add _sniff_subcommand() to detect which subcommand the user invoked and only build the full ArgumentParser for that single subcommand, skipping expensive parser construction for unused ones. Also adds a compatibility shim for jsonargparse >=4.47 where _ActionSubCommands moved to a new module path.
Replace eager imports in package __init__.py files with __getattr__- based lazy loading: - models: 23 model classes loaded on-demand via _MODEL_CLASS_MAP - data: datamodules/datasets/enums loaded lazily; dataclasses kept eager to avoid circular imports (video.py imports VideoItem at module level) - engine: Engine, XPUAccelerator, SingleXPUStrategy loaded on-demand This eliminates the cascade where importing any anomalib subpackage would trigger torch + lightning + all models + all datasets.
…parse 1.7 compat module_available() from lightning_utilities actually imports the target module, adding ~3s for lightning.pytorch and ~3.7s for anomalib.pipelines. Replace with importlib.util.find_spec() which only checks if a module is findable without importing it. Also defer PIPELINE_REGISTRY import to only trigger when the selected subcommand is a pipeline command, preventing eager Benchmark import on every CLI invocation. Add _format_usage() shim in CustomHelpFormatter to normalize tuples to lists, fixing a crash between rich-argparse >= 1.7 (passes actions as tuple) and jsonargparse (only handles list/dict). Tested with jsonargparse 4.48.0 + rich-argparse 1.7.2.
- Fix _sniff_subcommand to skip -c/--config option values - Remove unused get_short_docstring import - Add lazy logger class names to __all__ - Add globals() caching in loggers __getattr__ - Fix list_models docstring to match snake/pascal/title params - Restore type annotation on get_model via TYPE_CHECKING - Replace fragile string sentinel with object() in pipelines
f94c5f5 to
8575531
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (1)
src/anomalib/data/init.py:151
get_datamodulelost its type annotation (config: DictConfig | ListConfig | dict) even though the docstring still documents those accepted types. To preserve typing/IDE support without reintroducing eager imports, consider restoring the annotated signature usingfrom typing import TYPE_CHECKING+if TYPE_CHECKING: from omegaconf import DictConfig, ListConfig(and/ortyping.Anyas needed).
def get_datamodule(config) -> AnomalibDataModule:
"""Get Anomaly Datamodule from config.
Args:
config: Configuration for the anomaly model. Can be either:
| __all__ = [ | ||
| "configure_logger", | ||
| "AnomalibCometLogger", | ||
| "AnomalibMLFlowLogger", | ||
| "AnomalibTensorBoardLogger", | ||
| "AnomalibWandbLogger", | ||
| ] |
There was a problem hiding this comment.
__all__ now always includes the optional logger classes. This makes from anomalib.loggers import * (and some doc tools that iterate __all__) eagerly access/import optional logger modules and can raise ImportError if optional deps like matplotlib/vendor SDKs are missing. Consider keeping __all__ limited to always-available symbols (e.g. configure_logger) and leaving optional logger names out of __all__ (or only adding them conditionally / under TYPE_CHECKING).
| __all__ = [ | |
| "configure_logger", | |
| "AnomalibCometLogger", | |
| "AnomalibMLFlowLogger", | |
| "AnomalibTensorBoardLogger", | |
| "AnomalibWandbLogger", | |
| ] | |
| __all__ = ["configure_logger"] |
| "ImageDataFormat": ".datamodules.image", | ||
| "Kaputt": ".datamodules.image", | ||
| "Kolektor": ".datamodules.image", | ||
| "MVTec": ".datamodules.image", |
There was a problem hiding this comment.
_LAZY_IMPORTS maps "MVTec" to .datamodules.image, but anomalib.data.datamodules.image does not define a MVTec symbol (it defines MVTecAD, MVTecAD2, etc.). As a result, from anomalib.data import MVTec (and __all__ exporting it) will raise AttributeError. Please remove this export or add a real alias (e.g. MVTec = MVTecAD) in the target module and keep __all__ consistent.
| "MVTec": ".datamodules.image", |
| from anomalib.utils.path import convert_to_snake_case, convert_to_title_case | ||
|
|
||
| _import_all_models() | ||
|
|
||
| if case not in {"snake", "pascal", "title"}: |
There was a problem hiding this comment.
list_models() currently calls _import_all_models(), which forces importing every model module (and their heavy / optional dependencies) just to produce a name list. This undermines the lazy-import goal and can also break in minimal installs if any model has extra optional deps. Since you already have _MODEL_CLASS_MAP, list_models() can derive the snake/title/pascal names from the map keys without importing the model implementations.
| from anomalib.utils.path import convert_snake_to_pascal_case | ||
|
|
||
| _import_all_models() | ||
|
|
||
| logger.info("Loading the model.") |
There was a problem hiding this comment.
_get_model_class_by_name() calls _import_all_models() and then scans AnomalibModule.__subclasses__(), which eagerly imports every model even when the user requested a single one. This can be very expensive and may fail if unrelated models have optional dependencies. Consider resolving the normalized name against _MODEL_CLASS_MAP (case-insensitive) and importing only that one module/class, falling back to the current error if no match exists.
| # Flags whose next token is a value, not a subcommand. | ||
| _OPTIONS_WITH_VALUE = frozenset({"-c", "--config"}) | ||
|
|
||
| @staticmethod | ||
| def _sniff_subcommand(args: Sequence[str] | None) -> str | None: | ||
| """Peek at args to identify the subcommand without full parsing.""" |
There was a problem hiding this comment.
New _sniff_subcommand() + conditional argument registration changes how parsers are constructed and is now critical to CLI correctness (e.g. -c/--config before/after the subcommand, --help behavior, and ensuring only the selected subcommand gets heavy argument construction). There are CLI integration tests, but there doesn’t appear to be focused coverage for these parsing edge cases; adding a small unit test matrix around _sniff_subcommand and subcommand parser construction would help prevent regressions.
Summary
Reduces
anomalib --helpstartup from ~4.0s → ~0.15s (26× faster) by eliminating unnecessary eager imports of torch, lightning, all 23 models, all 15+ datasets, and optional logger backends during CLI initialization.Problem
Every CLI invocation — even
anomalib --help— eagerly imported the entire anomalib stack:torch+lightning(~2.8s alone)Engineclass with all its dependenciesThis made even trivial CLI operations painfully slow.
Solution
Three-phase lazy loading strategy:
Phase 1: Lazy CLI imports
cli/cli.py: Removed all top-level heavy imports (Trainer, torch, Engine, AnomalibModule, AnomalibDataModule). Imports are deferred to the functions that actually need them.cli/pipelines.py: Pipeline registry loaded lazily via__getattr__.cli/utils/help_formatter.py: Engine import deferred with cached_get_docstring_usage().Phase 2: Deferred subcommand parser construction
cli/cli.py: Added_sniff_subcommand()that detects which subcommand the user invoked fromsys.argv, then only builds the full ArgumentParser for that single subcommand — skipping expensive parser construction for all unused subcommands.Phase 3: Lazy
__init__.pyre-exportsmodels/__init__.py: 23 model classes →_MODEL_CLASS_MAP+__getattr__on-demand loading.data/__init__.py: Datamodules, datasets, and data-format enums loaded lazily. Dataclasses (ImageItem, VideoItem, etc.) kept as eager imports to avoid circular import issues.engine/__init__.py: Engine, XPUAccelerator, SingleXPUStrategy → lazy__getattr__.loggers/__init__.py: Comet, MLflow, TensorBoard, WandB loggers → lazy__getattr__.Compatibility
_ActionSubCommandsimport shim forjsonargparse >=4.47(class moved tojsonargparse._subcommands).Benchmarks
anomalib --helpanomalib install --helpanomalib train --helpExample