Skip to content

Slow startup time due to optional imports #589

@crest42

Description

@crest42

Is your feature request related to a problem? Please describe.
I have an (admittedly odd) use-case, where I run a single script repeatedly, that executes a few queries and dump some data. The overall runtime of the script is rather high compared to the actual time spent executing the queries and dumping the data. (1.2s overall runtime, ~0.2s actual spent on work)

The reason for this are the imports in options.py. When running the following python snippet:

✗ time python -c "import clickhouse_connect"
python -c "import clickhouse_connect"  1.04s user 0.85s system 239% cpu 0.789 total

We see it takes about 0.8s wall clock time to run. If we remove the imports from options.py and replace them with e. g. np = None:

✗ time python -c "import clickhouse_connect"
python -c "import clickhouse_connect"  0.17s user 0.02s system 99% cpu 0.201 total

We see an improvement in startup to 0.2s (4x faster). Regarding CPU time, the improvement is even more significant with a speedup to ~0.2s from ~1.9s.

Those modules get imported independent of whether they ever used or not. Making those imports lazy, could reduce the overhead for use-cases like mine quite significant. Obviously, I could get the same results by just not installing the modules in the python environment in the first place but I need them in other places so this isn't a viable option for me.

I will happily provide a PR with the solution described below if you consider this an issue as well, but I wanted to check in before spending the time doing actual development.

Describe the solution you'd like
One solution to achieve lazy-loading could be to introduce a "LazyLoader" class in options.py. Instead of importing the modules directly with an import statement we could do something like:

class LazyLoader:
  def __getattr__(self, name):
    if not self.loaded:
      self.load()
    else:
      return __getattr__(self.module, name)

  def load(self):
    try:
      import self.module_name
    ...

np = LazyLoader('numpy')

The obvious benefit of such an implementation is that is would act as an drop-in replacement, where no other code changes would be required. The downside of this change is the additional complexity and the associated risks.

Describe alternatives you've considered
An alternative, often used in other libraries (And described here https://peps.python.org/pep-0810/) is to wrap the import to a function call, such as:

def np():
  try:
    import numpy as np
    return np
  except:
    ...

The benefit of this solution is that it does only add little complexity but all usages of np in clickhouse_connect would need to change from np.something to np().something.

Describe alternatives you've considered
There is also https://docs.python.org/3/library/importlib.html#importlib.util.LazyLoader which could do the trick, although further investigation would be needed if this is really applicable. If yes, I think this could be the prefered solution, providing a drop-in replacement with python stdlib functionality.

Additional context
None

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions