Skip to content

Running mypy on script with pd.pivot_table gets exponentially slower with more columns #16749

Open
@blackary

Description

@blackary

Bug Report

When running mypy on a script containing pd.pivot_table, the amount of time for mypy to finish grows exponentially with the number of columns selected and aggregated in the table.

To Reproduce

from string import ascii_letters

import numpy as np
import pandas as pd

df = pd.DataFrame({letter: [1, 2, 3] for letter in ascii_letters[:14]})

df2 = pd.pivot_table(
    df,
    values=["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n"],
    columns=["a"],
    aggfunc={
        "a": np.sum,
        "b": np.sum,
        "c": np.sum,
        "d": np.sum,
        "e": np.sum,
        "f": np.sum,
        "g": np.sum,
        "h": np.sum,
        "i": np.sum,
        "j": np.sum,
        "k": np.sum,
        "l": np.sum,
        "m": np.sum,
        "n": np.sum,
    },
)

Expected Behavior

It should finish in <1s, like it does if there are fewer columns

image

Actual Behavior

Takes 200 seconds to run mypy on this script

Your Environment

  • Mypy version used: 1.7.0
  • Mypy command-line flags: None
  • Mypy configuration options from mypy.ini (and other config files): None
  • Python version used: 3.8.15

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions