Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-118761: Improve import time for pstats and zipfile by removing imports to typing #128981

Merged
merged 4 commits into from
Jan 23, 2025

Conversation

picnixz
Copy link
Member

@picnixz picnixz commented Jan 18, 2025

Those two imports are not really needed and we can reduce the import time of zipfile, or anything import it. Roughly, importing zipfile takes 8ms with this PR while it takes 10ms on main. Also, zipfile is not typed by default, so adding -> Self is a bit inconsistent (though it helps type checkers). I think such typing should be left to typeshed.

cc @jaraco as the one who added the type hints in zipfile.

@picnixz
Copy link
Member Author

picnixz commented Jan 20, 2025

@vstinner I plan to merge this one with the following commit title:

Improve import time for `pstats` and `zipfile`

And the following commit body:

Importing `pstats` or `zipfile` is now roughly 20% faster.

This is achieved by removing type annotations depending on `typing`.

I don't think I'll mention the fact that typing is no more implicitly exposed as the zipfile modification was quite recent and pstats is for debugging.

@picnixz picnixz enabled auto-merge (squash) January 23, 2025 14:17
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@picnixz picnixz merged commit a95dca7 into python:main Jan 23, 2025
38 checks passed
@picnixz picnixz deleted the perf/import/remove-typing-118761 branch January 23, 2025 15:10
@jaraco
Copy link
Member

jaraco commented Jan 26, 2025

Also, zipfile is not typed by default, so adding -> Self is a bit inconsistent (though it helps type checkers). I think such typing should be left to typeshed.

It is inconsistent, but I've been taking to adding annotations when I write code, for clarity. If it's discouraged to add type annotations to Python code and just leave it to typeshed to keep up, I guess that's okay, though it sounds slightly terrible. If that's the case, though, and it's not possible to incrementally improve the native typing, I think we should document that somewhere in the dev guide (if it's not already). WDYT?

Could this same performance improvement have been gained by putting the import in an if TYPE_CHECKING: block?

@vstinner
Copy link
Member

The status quo is to use typeshed, correct. Do we have other stdlib modules which use type annotations? I'm mostly aware of test.libregrtest which is more an application than a stdlib module.

@AA-Turner
Copy link
Member

@jaraco I think this thread represents the latest position. Typeshed overrides the stdlib, so there's no real benefit (and real costs) to having type annotations in the stdlib. Using TYPE_CHECKING would still need the import -- sadly there's no consensus to codify using TYPE_CHECKING = False, or similar.

A

@gpshead
Copy link
Member

gpshead commented Jan 27, 2025

I would not have done this PR without evidence that there was an actual performance problem.

In realistic applications, typing will already be imported so this code was a no-op. This PR removing a natural import feels like chasing a goal of preventing an unrealistic microbenchmark regression rather than looking at things holistically. Perhaps focus on improving the import time of typing in real world scenarios rather than trying to prevent its use?

If the import time of zipfile is seriously a target, it also does other slow things like always importing compression modules, when 0-1 of them are the most likely to be used.

(it was mentioned in https://discuss.python.org/t/static-type-annotations-in-cpython/65068/9)

@eli-schwartz
Copy link
Contributor

In realistic applications, typing will already be imported so this code was a no-op. This PR removing a natural import feels like chasing a goal of preventing an unrealistic microbenchmark regression rather than looking at things holistically. Perhaps focus on improving the import time of typing in real world scenarios rather than trying to prevent its use?

I'm not sure how it follows that "realistic applications" already import typing? I'm quite certain you will find lots of realistic applications written by people who never really bought into "that whole type annotations thing".

If the import time of zipfile is seriously a target, it also does other slow things like always importing compression modules, when 0-1 of them are the most likely to be used.

That's an extremely reasonable point and can surely be done as well, but I don't see why it should constitute a valid criticism of making a different, unrelated speed improvement?

@gpshead
Copy link
Member

gpshead commented Jan 27, 2025

That's an extremely reasonable point and can surely be done as well, but I don't see why it should constitute a valid criticism of making a different, unrelated speed improvement?

Because it illustrates my point that this wasn't a "speed improvement" but more of a reactionary minor performance regression prevention without considering what could be done to actually optimize import zipfile overall.

@eli-schwartz
Copy link
Contributor

Because it illustrates my point that this wasn't a "speed improvement" but more of a reactionary minor performance regression prevention without considering what could be done to actually optimize import zipfile overall.

Okay, so I actually went in and took some timings.

Here is my patch:

diff --git a/Lib/zipfile/__init__.py b/Lib/zipfile/__init__.py
index b8b496ad947..5f479965ba3 100644
--- a/Lib/zipfile/__init__.py
+++ b/Lib/zipfile/__init__.py
@@ -21,16 +21,6 @@
     zlib = None
     crc32 = binascii.crc32
 
-try:
-    import bz2 # We may need its compression method
-except ImportError:
-    bz2 = None
-
-try:
-    import lzma # We may need its compression method
-except ImportError:
-    lzma = None
-
 __all__ = ["BadZipFile", "BadZipfile", "error",
            "ZIP_STORED", "ZIP_DEFLATED", "ZIP_BZIP2", "ZIP_LZMA",
            "is_zipfile", "ZipInfo", "ZipFile", "PyZipFile", "LargeZipFile",
@@ -705,6 +695,7 @@ def __init__(self):
         self._comp = None
 
     def _init(self):
+        import lzma
         props = lzma._encode_filter_properties({'id': lzma.FILTER_LZMA1})
         self._comp = lzma.LZMACompressor(lzma.FORMAT_RAW, filters=[
             lzma._decode_filter_properties(lzma.FILTER_LZMA1, props)
@@ -731,6 +722,7 @@ def __init__(self):
 
     def decompress(self, data):
         if self._decomp is None:
+            import lzma
             self._unconsumed += data
             if len(self._unconsumed) <= 4:
                 return b''
@@ -778,11 +770,15 @@ def _check_compression(compression):
             raise RuntimeError(
                 "Compression requires the (missing) zlib module")
     elif compression == ZIP_BZIP2:
-        if not bz2:
+        try:
+            import bz2
+        except ImportError:
             raise RuntimeError(
                 "Compression requires the (missing) bz2 module")
     elif compression == ZIP_LZMA:
-        if not lzma:
+        try:
+            import lzma
+        except ImportError:
             raise RuntimeError(
                 "Compression requires the (missing) lzma module")
     else:
@@ -795,6 +791,7 @@ def _get_compressor(compress_type, compresslevel=None):
             return zlib.compressobj(compresslevel, zlib.DEFLATED, -15)
         return zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION, zlib.DEFLATED, -15)
     elif compress_type == ZIP_BZIP2:
+        import bz2
         if compresslevel is not None:
             return bz2.BZ2Compressor(compresslevel)
         return bz2.BZ2Compressor()
@@ -812,6 +809,7 @@ def _get_decompressor(compress_type):
     elif compress_type == ZIP_DEFLATED:
         return zlib.decompressobj(-15)
     elif compress_type == ZIP_BZIP2:
+        import bz2
         return bz2.BZ2Decompressor()
     elif compress_type == ZIP_LZMA:
         return LZMADecompressor()

Note that I don't delay the zipfile import; it's needed to handle crc32 consistently and it wasn't obviously something that would be significant to optimize.

I have 3 timings:

  • The first is for zipfile without this typing change.
  • The second is for zipfile with this typing change.
  • The third is for zipfile with my patch applied.
$ LD_LIBRARY_PATH=$PWD hyperfine --warmup 8 './python -c "import zipfile"'
Benchmark 1: ./python -c "import zipfile"
  Time (mean ± σ):      35.6 ms ±   5.7 ms    [User: 30.1 ms, System: 4.9 ms]
  Range (min … max):    30.8 ms …  50.3 ms    65 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
LD_LIBRARY_PATH=$PWD hyperfine --warmup 8 './python -c "import zipfile"'
Benchmark 1: ./python -c "import zipfile"
  Time (mean ± σ):      26.9 ms ±   5.1 ms    [User: 22.1 ms, System: 4.5 ms]
  Range (min … max):    24.0 ms …  53.6 ms    115 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
$ LD_LIBRARY_PATH=$PWD hyperfine --warmup 8 './python -c "import zipfile"'
Benchmark 1: ./python -c "import zipfile"
  Time (mean ± σ):      25.4 ms ±   2.6 ms    [User: 20.7 ms, System: 4.5 ms]
  Range (min … max):    24.2 ms …  50.1 ms    120 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

I cannot remotely guarantee I know what I'm doing with a benchmarking tool, I've said so in other PRs too. ;) Still, the experimental results I got say that there are HUGE gains to be gotten from removing typing, and effectively no gains whatsoever to gain from avoiding bz2 / lzma when they aren't used.

The reason I did these timings was because I thought it sounded like a great idea to solve this as a followup:

If the import time of zipfile is seriously a target, it also does other slow things like always importing compression modules, when 0-1 of them are the most likely to be used.

But based on my timings I've changed my mind and don't intend to submit this patch as it feels useless to waste time caring about this insignificant and not at all slow import.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants