-
-
Notifications
You must be signed in to change notification settings - Fork 31.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-118761: Improve import time for pstats
and zipfile
by removing imports to typing
#128981
gh-118761: Improve import time for pstats
and zipfile
by removing imports to typing
#128981
Conversation
@vstinner I plan to merge this one with the following commit title: Improve import time for `pstats` and `zipfile` And the following commit body: Importing `pstats` or `zipfile` is now roughly 20% faster.
This is achieved by removing type annotations depending on `typing`. I don't think I'll mention the fact that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
It is inconsistent, but I've been taking to adding annotations when I write code, for clarity. If it's discouraged to add type annotations to Python code and just leave it to typeshed to keep up, I guess that's okay, though it sounds slightly terrible. If that's the case, though, and it's not possible to incrementally improve the native typing, I think we should document that somewhere in the dev guide (if it's not already). WDYT? Could this same performance improvement have been gained by putting the import in an |
The status quo is to use typeshed, correct. Do we have other stdlib modules which use type annotations? I'm mostly aware of test.libregrtest which is more an application than a stdlib module. |
@jaraco I think this thread represents the latest position. Typeshed overrides the stdlib, so there's no real benefit (and real costs) to having type annotations in the stdlib. Using A |
I would not have done this PR without evidence that there was an actual performance problem. In realistic applications, If the import time of (it was mentioned in https://discuss.python.org/t/static-type-annotations-in-cpython/65068/9) |
I'm not sure how it follows that "realistic applications" already import
That's an extremely reasonable point and can surely be done as well, but I don't see why it should constitute a valid criticism of making a different, unrelated speed improvement? |
Because it illustrates my point that this wasn't a "speed improvement" but more of a reactionary minor performance regression prevention without considering what could be done to actually optimize |
Okay, so I actually went in and took some timings. Here is my patch: diff --git a/Lib/zipfile/__init__.py b/Lib/zipfile/__init__.py
index b8b496ad947..5f479965ba3 100644
--- a/Lib/zipfile/__init__.py
+++ b/Lib/zipfile/__init__.py
@@ -21,16 +21,6 @@
zlib = None
crc32 = binascii.crc32
-try:
- import bz2 # We may need its compression method
-except ImportError:
- bz2 = None
-
-try:
- import lzma # We may need its compression method
-except ImportError:
- lzma = None
-
__all__ = ["BadZipFile", "BadZipfile", "error",
"ZIP_STORED", "ZIP_DEFLATED", "ZIP_BZIP2", "ZIP_LZMA",
"is_zipfile", "ZipInfo", "ZipFile", "PyZipFile", "LargeZipFile",
@@ -705,6 +695,7 @@ def __init__(self):
self._comp = None
def _init(self):
+ import lzma
props = lzma._encode_filter_properties({'id': lzma.FILTER_LZMA1})
self._comp = lzma.LZMACompressor(lzma.FORMAT_RAW, filters=[
lzma._decode_filter_properties(lzma.FILTER_LZMA1, props)
@@ -731,6 +722,7 @@ def __init__(self):
def decompress(self, data):
if self._decomp is None:
+ import lzma
self._unconsumed += data
if len(self._unconsumed) <= 4:
return b''
@@ -778,11 +770,15 @@ def _check_compression(compression):
raise RuntimeError(
"Compression requires the (missing) zlib module")
elif compression == ZIP_BZIP2:
- if not bz2:
+ try:
+ import bz2
+ except ImportError:
raise RuntimeError(
"Compression requires the (missing) bz2 module")
elif compression == ZIP_LZMA:
- if not lzma:
+ try:
+ import lzma
+ except ImportError:
raise RuntimeError(
"Compression requires the (missing) lzma module")
else:
@@ -795,6 +791,7 @@ def _get_compressor(compress_type, compresslevel=None):
return zlib.compressobj(compresslevel, zlib.DEFLATED, -15)
return zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION, zlib.DEFLATED, -15)
elif compress_type == ZIP_BZIP2:
+ import bz2
if compresslevel is not None:
return bz2.BZ2Compressor(compresslevel)
return bz2.BZ2Compressor()
@@ -812,6 +809,7 @@ def _get_decompressor(compress_type):
elif compress_type == ZIP_DEFLATED:
return zlib.decompressobj(-15)
elif compress_type == ZIP_BZIP2:
+ import bz2
return bz2.BZ2Decompressor()
elif compress_type == ZIP_LZMA:
return LZMADecompressor() Note that I don't delay the zipfile import; it's needed to handle crc32 consistently and it wasn't obviously something that would be significant to optimize. I have 3 timings:
I cannot remotely guarantee I know what I'm doing with a benchmarking tool, I've said so in other PRs too. ;) Still, the experimental results I got say that there are HUGE gains to be gotten from removing The reason I did these timings was because I thought it sounded like a great idea to solve this as a followup:
But based on my timings I've changed my mind and don't intend to submit this patch as it feels useless to waste time caring about this insignificant and not at all slow import. |
Those two imports are not really needed and we can reduce the import time of
zipfile
, or anything import it. Roughly, importingzipfile
takes 8ms with this PR while it takes10ms
on main. Also,zipfile
is not typed by default, so adding-> Self
is a bit inconsistent (though it helps type checkers). I think such typing should be left to typeshed.cc @jaraco as the one who added the type hints in
zipfile
.