Skip to content

import performance: large codebases that use a lot of attrs classes start taking a long time to load #575

Open
@glyph

Description

@glyph

I need to do some more rigorous analysis of this, so my level of confidence here is "strong hunch", but: using attrs extensively, across a large codebase, can contribute non-trivially to Python's scourge of slow load times.

Many things in Python are slow: calling functions, importing modules, and so on. But one of the slowest things is reading and parsing Python code. This is why Python compiles and caches .pyc files - it's a substantial optimization to startup time.

Using attrs, unfortunately, partially undoes this optimization, because many of the methods in attr._make:

  1. generate a bunch of source code
  2. compile() it, then
  3. eval() the bytecode they just compiled.

Now, they do this for a good reason. Transparency and debug-ability are great, and more than once being able to "see through" the attrs validator stack while debugging has been legitimately useful. So I wouldn't want to propose a technique that makes a substantially different tradeoff. Not to mention that friendliness to PyPy's JIT is achieved via this mechanism, and definitely wouldn't want to trade that off. But could we possibly piggyback on Python's native caching technique here, and cache both the .pyc output and the source code, in the __pycache__ directory or thereabouts?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions