Skip to content

Mypy runs really slow on a complicated, generator-heavy function #14013

Open
@nickdrozd

Description

@nickdrozd

I have a function yield_programs, and it calls a helper function yield_actions. Obviously they are a bit complicated; they were written like this for fun. They live together in the following file:

import re
from itertools import product
from collections.abc import Iterator

SHIFTS = 'R', 'L'
HALT = '_'


def yield_actions(
    states: int,
    colors: int,
    halt: bool = False,
) -> Iterator[str]:
    yield from filter(
        (
            lambda action: action[2] != HALT or action == '1R_'
            if halt
            else lambda action: action
        ),
        (
            ''.join(prod)
            for prod in product(
                tuple(map(str, range(colors))),
                SHIFTS,
                tuple(map(chr, range(65, 65 + states))) + ((HALT,) if halt else ()),
            )
        ),
    )


def yield_programs(
    states: int,
    colors: int,
    halt: bool,
    rejects: list[str] | None = None,
) -> Iterator[str]:
    yield from filter(
        lambda prog: not any(re.compile(regex).match(prog) for regex in rejects or []),
        (
            prog
            for prog in (
                '  '.join(state)
                for state in product(
                    (
                        ' '.join(state)
                        for state in product(
                            yield_actions(states, colors, halt), repeat=colors  # <-- call to yield_actions
                        )
                    ),
                    repeat=states,
                )
            )
            if (prog[:3] == '1RB' and (not halt or prog.count(HALT) == 1))
        ),
    )

Mypy runs instantly and finds no issues.

Now I attempt to inline the call to yield_actions:

def yield_programs(
    states: int,
    colors: int,
    halt: bool,
    rejects: list[str] | None = None,
) -> Iterator[str]:
    yield from filter(
        lambda prog: not any(re.compile(regex).match(prog) for regex in rejects or []),
        (
            prog
            for prog in (
                '  '.join(state)
                for state in product(
                    (
                        ' '.join(state)
                        for state in product(
                            filter(  # <-- inlined call
                                (
                                    lambda action: action[2] != HALT or action == '1R_'
                                    if halt
                                    else lambda action: action
                                ),
                                (
                                    ''.join(prod)
                                    for prod in product(
                                        tuple(map(str, range(colors))),
                                        SHIFTS,
                                        tuple(map(chr, range(65, 65 + states)))
                                        + ((HALT,) if halt else ()),
                                    )
                                ),
                            ),
                            repeat=colors,
                        )
                    ),
                    repeat=states,
                )
            )
            if (prog[:3] == '1RB' and (not halt or prog.count(HALT) == 1))
        ),
    )

Mypy still finds no issues. However, it now takes around 50 seconds to run. Granted this function is bizarrely complicated (please don't ask why I would want to do such a thing), but that still seems like an awfully long time.

My feeling (which could be totally wrong) is that there is something that could be cached somewhere and that would fix this immediately. Or maybe it's because of the generators.

I did some profiling. Here is the callgraph colored by time spent in each function:

color-by-self-time

And the callgraph colored by time spent in each function along with subcalls:

color-by-total-time

mypy 0.982 (compiled: no)
Python 3.11.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions