Skip to content

⚡ perf(cli): faster CLI startup via lazy imports#3535

Draft
samet-akcay wants to merge 5 commits into
open-edge-platform:mainfrom
samet-akcay:perf/lazy-cli-imports
Draft

⚡ perf(cli): faster CLI startup via lazy imports#3535
samet-akcay wants to merge 5 commits into
open-edge-platform:mainfrom
samet-akcay:perf/lazy-cli-imports

Conversation

@samet-akcay
Copy link
Copy Markdown
Contributor

@samet-akcay samet-akcay commented Apr 14, 2026

Summary

Reduces anomalib --help startup from ~4.0s → ~0.15s (26× faster) by eliminating unnecessary eager imports of torch, lightning, all 23 models, all 15+ datasets, and optional logger backends during CLI initialization.

Problem

Every CLI invocation — even anomalib --help — eagerly imported the entire anomalib stack:

  • torch + lightning (~2.8s alone)
  • All 23 model classes (each importing torch.nn, etc.)
  • All 15+ dataset/datamodule classes
  • Optional logger backends (Comet, MLflow, TensorBoard, WandB)
  • Full Engine class with all its dependencies

This made even trivial CLI operations painfully slow.

Solution

Three-phase lazy loading strategy:

Phase 1: Lazy CLI imports

  • cli/cli.py: Removed all top-level heavy imports (Trainer, torch, Engine, AnomalibModule, AnomalibDataModule). Imports are deferred to the functions that actually need them.
  • cli/pipelines.py: Pipeline registry loaded lazily via __getattr__.
  • cli/utils/help_formatter.py: Engine import deferred with cached _get_docstring_usage().

Phase 2: Deferred subcommand parser construction

  • cli/cli.py: Added _sniff_subcommand() that detects which subcommand the user invoked from sys.argv, then only builds the full ArgumentParser for that single subcommand — skipping expensive parser construction for all unused subcommands.

Phase 3: Lazy __init__.py re-exports

  • models/__init__.py: 23 model classes → _MODEL_CLASS_MAP + __getattr__ on-demand loading.
  • data/__init__.py: Datamodules, datasets, and data-format enums loaded lazily. Dataclasses (ImageItem, VideoItem, etc.) kept as eager imports to avoid circular import issues.
  • engine/__init__.py: Engine, XPUAccelerator, SingleXPUStrategy → lazy __getattr__.
  • loggers/__init__.py: Comet, MLflow, TensorBoard, WandB loggers → lazy __getattr__.

Compatibility

  • Added _ActionSubCommands import shim for jsonargparse >=4.47 (class moved to jsonargparse._subcommands).

Benchmarks

Command Before After Speedup
anomalib --help 4.0s 0.15s 26×
anomalib install --help 4.0s 0.15s 26×
anomalib train --help 5.2s 3.9s 25% faster

train --help remaining 3.9s is the irreducible torch + lightning import cost (~2.8s floor) — these must be loaded to construct the training argument parser.

Example

# Before: every command paid the full import tax
$ time anomalib --help    # 4.0s
$ time anomalib install   # 4.0s

# After: only pay for what you use
$ time anomalib --help    # 0.15s
$ time anomalib install   # 0.15s
$ time anomalib train --help  # 3.9s (torch+lightning required here)

Copilot AI review requested due to automatic review settings April 14, 2026 12:08
@samet-akcay samet-akcay changed the title ⚡ perf(cli): dramatically faster CLI startup via lazy imports ⚡ perf(cli): faster CLI startup via lazy imports Apr 14, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors Anomalib’s CLI and public package __init__.py re-exports to be lazily imported, significantly reducing startup time for lightweight CLI calls (notably anomalib --help) by avoiding eager imports of heavy dependencies (Torch/Lightning, models, datasets, loggers, engine) unless a subcommand actually needs them.

Changes:

  • Implement lazy-loading via module-level __getattr__ in anomalib.models, anomalib.data, anomalib.engine, and anomalib.loggers.
  • Speed up CLI initialization by deferring heavy imports and by only fully building the selected subcommand’s parser arguments.
  • Make CLI help/docstring behavior lazy by deferring Engine method references used for help text.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/anomalib/cli/cli.py Adds subcommand “sniffing” + moves heavy imports into subcommand-specific paths to reduce CLI startup overhead.
src/anomalib/cli/pipelines.py Lazily initializes the pipeline registry and avoids importing pipelines during basic CLI/help flows.
src/anomalib/cli/utils/help_formatter.py Lazily builds the Engine-method mapping used for docstring-derived usage/help panels.
src/anomalib/loggers/__init__.py Switches optional logger integrations to lazy attribute loading.
src/anomalib/models/__init__.py Replaces eager model imports with a class→module map and lazy __getattr__ loading.
src/anomalib/data/__init__.py Keeps lightweight dataclasses eager but makes datasets/datamodules/enums lazy via __getattr__.
src/anomalib/engine/__init__.py Lazily re-exports Engine, XPUAccelerator, and SingleXPUStrategy.
Comments suppressed due to low confidence (1)

src/anomalib/data/init.py:144

  • get_datamodule no longer has the DictConfig | ListConfig | dict type annotation even though the docstring still claims it supports ListConfig. With the current implementation, passing a ListConfig will fail at config_.class_path because a ListConfig is neither converted nor handled specially. Either add explicit ListConfig handling (and keep the typed signature via TYPE_CHECKING imports) or remove ListConfig from the documented/typed contract.
def get_datamodule(config) -> AnomalibDataModule:
    """Get Anomaly Datamodule from config.

    Args:
        config: Configuration for the anomaly model. Can be either:

Comment thread src/anomalib/cli/cli.py
Comment on lines +93 to +95
for token in tokens:
if not token.startswith("-"):
return token
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_sniff_subcommand treats the value passed to global options (e.g. -c/--config) as the subcommand because it returns the first token that doesn't start with -. This breaks valid invocations like anomalib --config config.yaml train/anomalib -c config.yaml train (it will think config.yaml is the subcommand and skip building the real subcommand parser, likely causing config-file parsing/validation failures). Update the sniffing logic to skip option values for known global flags (at least -c/--config and --config=...), or fall back to building the full parser when a config file is provided.

Suggested change
for token in tokens:
if not token.startswith("-"):
return token
index = 0
while index < len(tokens):
token = tokens[index]
if token in {"-c", "--config"}:
index += 2
continue
if token.startswith("--config="):
index += 1
continue
if not token.startswith("-"):
return token
index += 1

Copilot uses AI. Check for mistakes.
Comment thread src/anomalib/cli/cli.py Outdated

from anomalib import __version__
from anomalib.cli.pipelines import PIPELINE_REGISTRY, pipeline_subcommands, run_pipeline
from anomalib.cli.utils.help_formatter import CustomHelpFormatter, get_short_docstring
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_short_docstring is imported but never used in this module. Please remove the unused import to avoid unnecessary coupling and keep the lazy-import goal consistent.

Suggested change
from anomalib.cli.utils.help_formatter import CustomHelpFormatter, get_short_docstring
from anomalib.cli.utils.help_formatter import CustomHelpFormatter

Copilot uses AI. Check for mistakes.
Comment thread src/anomalib/loggers/__init__.py Outdated
Comment on lines +32 to +34
__all__ = ["configure_logger"]

try:
from .comet import AnomalibCometLogger # noqa: F401
from .mlflow import AnomalibMLFlowLogger # noqa: F401
from .tensorboard import AnomalibTensorBoardLogger # noqa: F401
from .wandb import AnomalibWandbLogger # noqa: F401

__all__.extend(
[
"AnomalibCometLogger",
"AnomalibTensorBoardLogger",
"AnomalibWandbLogger",
"AnomalibMLFlowLogger",
],
)
except ImportError:
print("To use any logger install it using `anomalib install -v`")
_LOGGER_NAMES = {
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module docstring advertises importing Anomalib*Logger classes from anomalib.loggers, but __all__ only exposes configure_logger. This changes star-import/introspection behavior compared to the previous eager-import implementation and is inconsistent with other lazy __init__.py modules here (e.g., anomalib.models, anomalib.data, anomalib.engine). Consider adding the lazy-exported logger class names to __all__ (even if they remain lazily loaded via __getattr__).

Copilot uses AI. Check for mistakes.
Comment thread src/anomalib/loggers/__init__.py Outdated
import importlib

module = importlib.import_module(_LOGGER_NAMES[name], __name__)
return getattr(module, name)
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__getattr__ returns the requested logger class but does not cache it in globals(). This causes repeated attribute access (or repeated from anomalib.loggers import ... in different modules) to re-import the same submodule. Consider caching the resolved object in globals()[name] (as done in anomalib.engine.__getattr__ and anomalib.models.__getattr__).

Suggested change
return getattr(module, name)
logger_class = getattr(module, name)
globals()[name] = logger_class
return logger_class

Copilot uses AI. Check for mistakes.
Comment on lines 187 to +191
"""
from anomalib.utils.path import convert_to_snake_case, convert_to_title_case

_import_all_models()

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list_models docstring describes supported case values as snake_case/original with default snake_case, but the implementation accepts only {snake, pascal, title} with default snake. Please update the docstring so the documented API matches the runtime behavior.

Copilot uses AI. Check for mistakes.
Comment thread src/anomalib/models/__init__.py Outdated
Comment on lines 253 to 256
def get_model(model, *args, **kwdargs) -> AnomalibModule:
"""Get an anomaly detection model instance.

This function instantiates an anomaly detection model based on the provided
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_model lost its input type annotation (was DictConfig | str | dict | Namespace). Since this is a public API, dropping the signature typing reduces IDE/type-checker usefulness. Consider keeping the annotation using from __future__ import annotations and/or TYPE_CHECKING imports so you can stay lazy at runtime while preserving type information.

Copilot uses AI. Check for mistakes.
Comment thread src/anomalib/cli/pipelines.py Outdated
Comment on lines +24 to +26
_PIPELINE_REGISTRY: dict[str, type[Pipeline]] | None | str = "uninitialized"

_PIPELINE_DESCRIPTIONS: dict[str, str] = {
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the string literal sentinel 'uninitialized' for _PIPELINE_REGISTRY forces the variable type to include str and is easy to accidentally collide with. A dedicated sentinel object (e.g., a private object() instance) would avoid widening the type and make the state machine clearer.

Copilot uses AI. Check for mistakes.
Replace eager imports of Comet, MLflow, TensorBoard, and WandB
loggers with __getattr__-based lazy loading. These backends pull in
heavy optional dependencies that are unnecessary for basic CLI
operations.
Phase 1: Replace all top-level heavy imports (torch, lightning, Engine,
AnomalibModule, AnomalibDataModule, pipeline registry) with deferred
imports inside the functions that need them.

Phase 2: Add _sniff_subcommand() to detect which subcommand the user
invoked and only build the full ArgumentParser for that single
subcommand, skipping expensive parser construction for unused ones.

Also adds a compatibility shim for jsonargparse >=4.47 where
_ActionSubCommands moved to a new module path.
Replace eager imports in package __init__.py files with __getattr__-
based lazy loading:

- models: 23 model classes loaded on-demand via _MODEL_CLASS_MAP
- data: datamodules/datasets/enums loaded lazily; dataclasses kept
  eager to avoid circular imports (video.py imports VideoItem at
  module level)
- engine: Engine, XPUAccelerator, SingleXPUStrategy loaded on-demand

This eliminates the cascade where importing any anomalib subpackage
would trigger torch + lightning + all models + all datasets.
…parse 1.7 compat

module_available() from lightning_utilities actually imports the target
module, adding ~3s for lightning.pytorch and ~3.7s for anomalib.pipelines.
Replace with importlib.util.find_spec() which only checks if a module
is findable without importing it.

Also defer PIPELINE_REGISTRY import to only trigger when the selected
subcommand is a pipeline command, preventing eager Benchmark import on
every CLI invocation.

Add _format_usage() shim in CustomHelpFormatter to normalize tuples to
lists, fixing a crash between rich-argparse >= 1.7 (passes actions as
tuple) and jsonargparse (only handles list/dict).

Tested with jsonargparse 4.48.0 + rich-argparse 1.7.2.
- Fix _sniff_subcommand to skip -c/--config option values
- Remove unused get_short_docstring import
- Add lazy logger class names to __all__
- Add globals() caching in loggers __getattr__
- Fix list_models docstring to match snake/pascal/title params
- Restore type annotation on get_model via TYPE_CHECKING
- Replace fragile string sentinel with object() in pipelines
Copilot AI review requested due to automatic review settings April 14, 2026 13:29
@samet-akcay samet-akcay force-pushed the perf/lazy-cli-imports branch from f94c5f5 to 8575531 Compare April 14, 2026 13:29
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (1)

src/anomalib/data/init.py:151

  • get_datamodule lost its type annotation (config: DictConfig | ListConfig | dict) even though the docstring still documents those accepted types. To preserve typing/IDE support without reintroducing eager imports, consider restoring the annotated signature using from typing import TYPE_CHECKING + if TYPE_CHECKING: from omegaconf import DictConfig, ListConfig (and/or typing.Any as needed).
def get_datamodule(config) -> AnomalibDataModule:
    """Get Anomaly Datamodule from config.

    Args:
        config: Configuration for the anomaly model. Can be either:

Comment on lines +32 to +38
__all__ = [
"configure_logger",
"AnomalibCometLogger",
"AnomalibMLFlowLogger",
"AnomalibTensorBoardLogger",
"AnomalibWandbLogger",
]
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__all__ now always includes the optional logger classes. This makes from anomalib.loggers import * (and some doc tools that iterate __all__) eagerly access/import optional logger modules and can raise ImportError if optional deps like matplotlib/vendor SDKs are missing. Consider keeping __all__ limited to always-available symbols (e.g. configure_logger) and leaving optional logger names out of __all__ (or only adding them conditionally / under TYPE_CHECKING).

Suggested change
__all__ = [
"configure_logger",
"AnomalibCometLogger",
"AnomalibMLFlowLogger",
"AnomalibTensorBoardLogger",
"AnomalibWandbLogger",
]
__all__ = ["configure_logger"]

Copilot uses AI. Check for mistakes.
"ImageDataFormat": ".datamodules.image",
"Kaputt": ".datamodules.image",
"Kolektor": ".datamodules.image",
"MVTec": ".datamodules.image",
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_LAZY_IMPORTS maps "MVTec" to .datamodules.image, but anomalib.data.datamodules.image does not define a MVTec symbol (it defines MVTecAD, MVTecAD2, etc.). As a result, from anomalib.data import MVTec (and __all__ exporting it) will raise AttributeError. Please remove this export or add a real alias (e.g. MVTec = MVTecAD) in the target module and keep __all__ consistent.

Suggested change
"MVTec": ".datamodules.image",

Copilot uses AI. Check for mistakes.
Comment on lines +201 to 205
from anomalib.utils.path import convert_to_snake_case, convert_to_title_case

_import_all_models()

if case not in {"snake", "pascal", "title"}:
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list_models() currently calls _import_all_models(), which forces importing every model module (and their heavy / optional dependencies) just to produce a name list. This undermines the lazy-import goal and can also break in minimal installs if any model has extra optional deps. Since you already have _MODEL_CLASS_MAP, list_models() can derive the snake/title/pascal names from the map keys without importing the model implementations.

Copilot uses AI. Check for mistakes.
Comment on lines +248 to 252
from anomalib.utils.path import convert_snake_to_pascal_case

_import_all_models()

logger.info("Loading the model.")
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_model_class_by_name() calls _import_all_models() and then scans AnomalibModule.__subclasses__(), which eagerly imports every model even when the user requested a single one. This can be very expensive and may fail if unrelated models have optional dependencies. Consider resolving the normalized name against _MODEL_CLASS_MAP (case-insensitive) and importing only that one module/class, falling back to the current error if no match exists.

Copilot uses AI. Check for mistakes.
Comment thread src/anomalib/cli/cli.py
Comment on lines +87 to +92
# Flags whose next token is a value, not a subcommand.
_OPTIONS_WITH_VALUE = frozenset({"-c", "--config"})

@staticmethod
def _sniff_subcommand(args: Sequence[str] | None) -> str | None:
"""Peek at args to identify the subcommand without full parsing."""
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New _sniff_subcommand() + conditional argument registration changes how parsers are constructed and is now critical to CLI correctness (e.g. -c/--config before/after the subcommand, --help behavior, and ensuring only the selected subcommand gets heavy argument construction). There are CLI integration tests, but there doesn’t appear to be focused coverage for these parsing edge cases; adding a small unit test matrix around _sniff_subcommand and subcommand parser construction would help prevent regressions.

Copilot generated this review using guidance from repository custom instructions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants