gh-118761: Improve import time for `pstats` and `zipfile` by removing imports to `typing` #128981

picnixz · 2025-01-18T10:24:20Z

Those two imports are not really needed and we can reduce the import time of zipfile, or anything import it. Roughly, importing zipfile takes 8ms with this PR while it takes 10ms on main. Also, zipfile is not typed by default, so adding -> Self is a bit inconsistent (though it helps type checkers). I think such typing should be left to typeshed.

cc @jaraco as the one who added the type hints in zipfile.

Issue: Improve import time of various stdlib modules #118761

picnixz · 2025-01-20T17:42:33Z

@vstinner I plan to merge this one with the following commit title:

Improve import time for `pstats` and `zipfile`

And the following commit body:

Importing `pstats` or `zipfile` is now roughly 20% faster.

This is achieved by removing type annotations depending on `typing`.

I don't think I'll mention the fact that typing is no more implicitly exposed as the zipfile modification was quite recent and pstats is for debugging.

vstinner

LGTM

jaraco · 2025-01-26T19:16:51Z

Also, zipfile is not typed by default, so adding -> Self is a bit inconsistent (though it helps type checkers). I think such typing should be left to typeshed.

It is inconsistent, but I've been taking to adding annotations when I write code, for clarity. If it's discouraged to add type annotations to Python code and just leave it to typeshed to keep up, I guess that's okay, though it sounds slightly terrible. If that's the case, though, and it's not possible to incrementally improve the native typing, I think we should document that somewhere in the dev guide (if it's not already). WDYT?

Could this same performance improvement have been gained by putting the import in an if TYPE_CHECKING: block?

vstinner · 2025-01-26T19:30:02Z

The status quo is to use typeshed, correct. Do we have other stdlib modules which use type annotations? I'm mostly aware of test.libregrtest which is more an application than a stdlib module.

AA-Turner · 2025-01-26T19:38:08Z

@jaraco I think this thread represents the latest position. Typeshed overrides the stdlib, so there's no real benefit (and real costs) to having type annotations in the stdlib. Using TYPE_CHECKING would still need the import -- sadly there's no consensus to codify using TYPE_CHECKING = False, or similar.

A

gpshead · 2025-01-27T01:56:13Z

I would not have done this PR without evidence that there was an actual performance problem.

In realistic applications, typing will already be imported so this code was a no-op. This PR removing a natural import feels like chasing a goal of preventing an unrealistic microbenchmark regression rather than looking at things holistically. Perhaps focus on improving the import time of typing in real world scenarios rather than trying to prevent its use?

If the import time of zipfile is seriously a target, it also does other slow things like always importing compression modules, when 0-1 of them are the most likely to be used.

(it was mentioned in https://discuss.python.org/t/static-type-annotations-in-cpython/65068/9)

eli-schwartz · 2025-01-27T02:15:57Z

In realistic applications, typing will already be imported so this code was a no-op. This PR removing a natural import feels like chasing a goal of preventing an unrealistic microbenchmark regression rather than looking at things holistically. Perhaps focus on improving the import time of typing in real world scenarios rather than trying to prevent its use?

I'm not sure how it follows that "realistic applications" already import typing? I'm quite certain you will find lots of realistic applications written by people who never really bought into "that whole type annotations thing".

If the import time of zipfile is seriously a target, it also does other slow things like always importing compression modules, when 0-1 of them are the most likely to be used.

That's an extremely reasonable point and can surely be done as well, but I don't see why it should constitute a valid criticism of making a different, unrelated speed improvement?

gpshead · 2025-01-27T02:35:05Z

That's an extremely reasonable point and can surely be done as well, but I don't see why it should constitute a valid criticism of making a different, unrelated speed improvement?

Because it illustrates my point that this wasn't a "speed improvement" but more of a reactionary minor performance regression prevention without considering what could be done to actually optimize import zipfile overall.

eli-schwartz · 2025-01-27T04:27:19Z

Because it illustrates my point that this wasn't a "speed improvement" but more of a reactionary minor performance regression prevention without considering what could be done to actually optimize import zipfile overall.

Okay, so I actually went in and took some timings.

Here is my patch:

diff --git a/Lib/zipfile/__init__.py b/Lib/zipfile/__init__.py
index b8b496ad947..5f479965ba3 100644
--- a/Lib/zipfile/__init__.py
+++ b/Lib/zipfile/__init__.py
@@ -21,16 +21,6 @@
     zlib = None
     crc32 = binascii.crc32
 
-try:
-    import bz2 # We may need its compression method
-except ImportError:
-    bz2 = None
-
-try:
-    import lzma # We may need its compression method
-except ImportError:
-    lzma = None
-
 __all__ = ["BadZipFile", "BadZipfile", "error",
            "ZIP_STORED", "ZIP_DEFLATED", "ZIP_BZIP2", "ZIP_LZMA",
            "is_zipfile", "ZipInfo", "ZipFile", "PyZipFile", "LargeZipFile",
@@ -705,6 +695,7 @@ def __init__(self):
         self._comp = None
 
     def _init(self):
+        import lzma
         props = lzma._encode_filter_properties({'id': lzma.FILTER_LZMA1})
         self._comp = lzma.LZMACompressor(lzma.FORMAT_RAW, filters=[
             lzma._decode_filter_properties(lzma.FILTER_LZMA1, props)
@@ -731,6 +722,7 @@ def __init__(self):
 
     def decompress(self, data):
         if self._decomp is None:
+            import lzma
             self._unconsumed += data
             if len(self._unconsumed) <= 4:
                 return b''
@@ -778,11 +770,15 @@ def _check_compression(compression):
             raise RuntimeError(
                 "Compression requires the (missing) zlib module")
     elif compression == ZIP_BZIP2:
-        if not bz2:
+        try:
+            import bz2
+        except ImportError:
             raise RuntimeError(
                 "Compression requires the (missing) bz2 module")
     elif compression == ZIP_LZMA:
-        if not lzma:
+        try:
+            import lzma
+        except ImportError:
             raise RuntimeError(
                 "Compression requires the (missing) lzma module")
     else:
@@ -795,6 +791,7 @@ def _get_compressor(compress_type, compresslevel=None):
             return zlib.compressobj(compresslevel, zlib.DEFLATED, -15)
         return zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION, zlib.DEFLATED, -15)
     elif compress_type == ZIP_BZIP2:
+        import bz2
         if compresslevel is not None:
             return bz2.BZ2Compressor(compresslevel)
         return bz2.BZ2Compressor()
@@ -812,6 +809,7 @@ def _get_decompressor(compress_type):
     elif compress_type == ZIP_DEFLATED:
         return zlib.decompressobj(-15)
     elif compress_type == ZIP_BZIP2:
+        import bz2
         return bz2.BZ2Decompressor()
     elif compress_type == ZIP_LZMA:
         return LZMADecompressor()

Note that I don't delay the zipfile import; it's needed to handle crc32 consistently and it wasn't obviously something that would be significant to optimize.

I have 3 timings:

The first is for zipfile without this typing change.
The second is for zipfile with this typing change.
The third is for zipfile with my patch applied.

$ LD_LIBRARY_PATH=$PWD hyperfine --warmup 8 './python -c "import zipfile"'
Benchmark 1: ./python -c "import zipfile"
  Time (mean ± σ):      35.6 ms ±   5.7 ms    [User: 30.1 ms, System: 4.9 ms]
  Range (min … max):    30.8 ms …  50.3 ms    65 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

LD_LIBRARY_PATH=$PWD hyperfine --warmup 8 './python -c "import zipfile"'
Benchmark 1: ./python -c "import zipfile"
  Time (mean ± σ):      26.9 ms ±   5.1 ms    [User: 22.1 ms, System: 4.5 ms]
  Range (min … max):    24.0 ms …  53.6 ms    115 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

$ LD_LIBRARY_PATH=$PWD hyperfine --warmup 8 './python -c "import zipfile"'
Benchmark 1: ./python -c "import zipfile"
  Time (mean ± σ):      25.4 ms ±   2.6 ms    [User: 20.7 ms, System: 4.5 ms]
  Range (min … max):    24.2 ms …  50.1 ms    120 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

I cannot remotely guarantee I know what I'm doing with a benchmarking tool, I've said so in other PRs too. ;) Still, the experimental results I got say that there are HUGE gains to be gotten from removing typing, and effectively no gains whatsoever to gain from avoiding bz2 / lzma when they aren't used.

The reason I did these timings was because I thought it sounded like a great idea to solve this as a followup:

If the import time of zipfile is seriously a target, it also does other slow things like always importing compression modules, when 0-1 of them are the most likely to be used.

But based on my timings I've changed my mind and don't intend to submit this patch as it feels useless to waste time caring about this insignificant and not at all slow import.

picnixz added 2 commits January 16, 2025 13:11

remove redundant imports to typing

e9d72b7

blurb

6d0b6e0

bedevere-app bot mentioned this pull request Jan 18, 2025

Improve import time of various stdlib modules #118761

Open

bedevere-app bot added the awaiting core review label Jan 18, 2025

Merge branch 'main' into perf/import/remove-typing-118761

66e9735

picnixz requested review from AA-Turner, hugovk and jaraco January 20, 2025 12:41

hugovk approved these changes Jan 20, 2025

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting core review labels Jan 20, 2025

AA-Turner approved these changes Jan 20, 2025

View reviewed changes

Merge branch 'main' into perf/import/remove-typing-118761

7242a99

picnixz enabled auto-merge (squash) January 23, 2025 14:17

vstinner approved these changes Jan 23, 2025

View reviewed changes

picnixz merged commit a95dca7 into python:main Jan 23, 2025
38 checks passed

bedevere-app bot removed the awaiting merge label Jan 23, 2025

picnixz deleted the perf/import/remove-typing-118761 branch January 23, 2025 15:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-118761: Improve import time for `pstats` and `zipfile` by removing imports to `typing` #128981

gh-118761: Improve import time for `pstats` and `zipfile` by removing imports to `typing` #128981

Uh oh!

picnixz commented Jan 18, 2025 •

edited

Loading

Uh oh!

picnixz commented Jan 20, 2025 •

edited

Loading

Uh oh!

vstinner left a comment

Uh oh!

Uh oh!

jaraco commented Jan 26, 2025

Uh oh!

vstinner commented Jan 26, 2025

Uh oh!

AA-Turner commented Jan 26, 2025

Uh oh!

gpshead commented Jan 27, 2025

Uh oh!

eli-schwartz commented Jan 27, 2025

Uh oh!

gpshead commented Jan 27, 2025

Uh oh!

eli-schwartz commented Jan 27, 2025

Uh oh!

Uh oh!

Uh oh!

gh-118761: Improve import time for pstats and zipfile by removing imports to typing #128981

gh-118761: Improve import time for pstats and zipfile by removing imports to typing #128981

Uh oh!

Conversation

picnixz commented Jan 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

picnixz commented Jan 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jaraco commented Jan 26, 2025

Uh oh!

vstinner commented Jan 26, 2025

Uh oh!

AA-Turner commented Jan 26, 2025

Uh oh!

gpshead commented Jan 27, 2025

Uh oh!

eli-schwartz commented Jan 27, 2025

Uh oh!

gpshead commented Jan 27, 2025

Uh oh!

eli-schwartz commented Jan 27, 2025

Uh oh!

Uh oh!

gh-118761: Improve import time for `pstats` and `zipfile` by removing imports to `typing` #128981

gh-118761: Improve import time for `pstats` and `zipfile` by removing imports to `typing` #128981

picnixz commented Jan 18, 2025 •

edited

Loading

picnixz commented Jan 20, 2025 •

edited

Loading