Skip to content

Speedup CI tests by bypassing CPython's difflib for large string comparisons? #1254

Open
@pfackeldey

Description

@pfackeldey

We may want think about changing some tests from:

assert large_str1 == large_str2

to

equal = large_str1 == large_str2
assert equal

A few tests, e.g. this one: https://github.com/scikit-hep/coffea/blob/master/tests/test_dataset_tools.py#L379-L380, have very large strings to compare.

pytest uses CPython's difflib to create a diff of the left and right value if used with assert directly (this is super helpful for small strings). difflib is unfortunately pretty slow for large strings especially with small differences, see pytest-dev/pytest#8998.

I'm not sure if this is always triggered or only for failing tests. I had to tweak this to make a failing test run locally, otherwise I was stuck in a not-in-my-lifetime-ending for-loop in difflib.

(this comes at the cost of not having a diff, but that's not useful to read for very large strings anyway)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions