Skip to content

Commit c17ae23

Browse files
committed
Merge branch 'develop' into master
* develop: Update CHANGELOG.md Add .xz and increase performance of compression module (#875) Bump pypa/gh-action-pypi-publish in /.github/workflows (#878) Bump actions/checkout from 4 to 5 in the github-actions group (#877) Fix release.sh for the final merge back into develop (#872)
2 parents baed654 + b54438f commit c17ae23

File tree

10 files changed

+95
-58
lines changed

10 files changed

+95
-58
lines changed

.github/workflows/python-package.yml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ jobs:
1111
linters:
1212
runs-on: ubuntu-24.04
1313
steps:
14-
- uses: actions/checkout@v4
14+
- uses: actions/checkout@v5
1515
with:
1616
fetch-depth: 0 # fetch git tags for setuptools_scm (smart_open.__version__)
1717

@@ -52,7 +52,7 @@ jobs:
5252
- {python-version: '3.12', os: windows-2025}
5353
- {python-version: '3.13', os: windows-2025}
5454
steps:
55-
- uses: actions/checkout@v4
55+
- uses: actions/checkout@v5
5656
with:
5757
fetch-depth: 0 # fetch git tags for setuptools_scm (smart_open.__version__)
5858

@@ -97,7 +97,7 @@ jobs:
9797
# - {python-version: '3.13', os: windows-2025}
9898

9999
steps:
100-
- uses: actions/checkout@v4
100+
- uses: actions/checkout@v5
101101
with:
102102
fetch-depth: 0 # fetch git tags for setuptools_scm (smart_open.__version__)
103103

@@ -137,7 +137,7 @@ jobs:
137137
# - {python-version: '3.13', os: windows-2025}
138138

139139
steps:
140-
- uses: actions/checkout@v4
140+
- uses: actions/checkout@v5
141141
with:
142142
fetch-depth: 0 # fetch git tags for setuptools_scm (smart_open.__version__)
143143

@@ -189,7 +189,7 @@ jobs:
189189
# - {python-version: '3.13', os: windows-2025}
190190

191191
steps:
192-
- uses: actions/checkout@v4
192+
- uses: actions/checkout@v5
193193
with:
194194
fetch-depth: 0 # fetch git tags for setuptools_scm (smart_open.__version__)
195195

.github/workflows/release.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ jobs:
1414
pull-requests: write # apexskier/github-release-commenter
1515

1616
steps:
17-
- uses: actions/checkout@v4
17+
- uses: actions/checkout@v5
1818
with:
1919
fetch-depth: 0 # fetch git tags for setuptools_scm (smart_open.__version__)
2020

@@ -38,7 +38,7 @@ jobs:
3838

3939
# https://github.com/pypa/gh-action-pypi-publish#trusted-publishing
4040
- name: Publish package distributions to PyPI
41-
uses: pypa/gh-action-pypi-publish@v1.12.4
41+
uses: pypa/gh-action-pypi-publish@v1.13.0
4242

4343
- uses: apexskier/[email protected]
4444
with:

CHANGELOG.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,10 @@
1+
# 7.3.1, 2025-09-08
2+
3+
- Fix release.sh for the final merge back into develop (PR [#872](https://github.com/piskvorky/smart_open/pull/872), [@ddelange](https://github.com/ddelange))
4+
- Bump actions/checkout from 4 to 5 in the github-actions group (PR [#877](https://github.com/piskvorky/smart_open/pull/877), [@dependabot[bot]](https://github.com/apps/dependabot))
5+
- Bump pypa/gh-action-pypi-publish from 1.12.4 to 1.13.0 in /.github/workflows (PR [#878](https://github.com/piskvorky/smart_open/pull/878), [@dependabot[bot]](https://github.com/apps/dependabot))
6+
- Add .xz and increase performance of compression module (PR [#875](https://github.com/piskvorky/smart_open/pull/875), [@ddelange](https://github.com/ddelange))
7+
18
# 7.3.0.post1, 2025-07-03
29

310
- Fix release.sh merge message and final merge (PR [#868](https://github.com/piskvorky/smart_open/pull/868), [@ddelange](https://github.com/ddelange))

README.rst

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -245,7 +245,7 @@ By default, ``smart_open`` determines the compression algorithm to use based on
245245

246246
.. code-block:: python
247247
248-
>>> from smart_open import open, register_compressor
248+
>>> from smart_open import open
249249
>>> with open('tests/test_data/1984.txt.gz') as fin:
250250
... print(fin.read(32))
251251
It was a bright cold day in Apri
@@ -255,7 +255,7 @@ To disable compression:
255255

256256
.. code-block:: python
257257
258-
>>> from smart_open import open, register_compressor
258+
>>> from smart_open import open
259259
>>> with open('tests/test_data/1984.txt.gz', 'rb', compression='disable') as fin:
260260
... print(fin.read(32))
261261
b'\x1f\x8b\x08\x08\x85F\x94\\\x00\x031984.txt\x005\x8f=r\xc3@\x08\x85{\x9d\xe2\x1d@'
@@ -265,7 +265,7 @@ To specify the algorithm explicitly (e.g. for non-standard file extensions):
265265

266266
.. code-block:: python
267267
268-
>>> from smart_open import open, register_compressor
268+
>>> from smart_open import open
269269
>>> with open('tests/test_data/1984.txt.gzip', compression='.gz') as fin:
270270
... print(fin.read(32))
271271
It was a bright cold day in Apri
@@ -279,18 +279,15 @@ For example, to open xz-compressed files:
279279
>>> from smart_open import open, register_compressor
280280
281281
>>> def _handle_xz(file_obj, mode):
282-
... return lzma.LZMAFile(filename=file_obj, mode=mode, format=lzma.FORMAT_XZ)
282+
... return lzma.LZMAFile(filename=file_obj, mode=mode)
283283
284284
>>> register_compressor('.xz', _handle_xz)
285285
286286
>>> with open('tests/test_data/1984.txt.xz') as fin:
287287
... print(fin.read(32))
288288
It was a bright cold day in Apri
289289
290-
``lzma`` is in the standard library in Python 3.3 and greater.
291-
For 2.7, use `backports.lzma`_.
292-
293-
.. _backports.lzma: https://pypi.org/project/backports.lzma/
290+
This is just an example: ``lzma`` is in the standard library and is registered by default.
294291

295292
Transport-specific Options
296293
--------------------------

extending.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -128,7 +128,7 @@ For example:
128128
```python
129129
def _handle_xz(file_obj, mode):
130130
import lzma
131-
return lzma.LZMAFile(filename=file_obj, mode=mode, format=lzma.FORMAT_XZ)
131+
return lzma.LZMAFile(filename=file_obj, mode=mode)
132132

133133

134134
register_compressor('.xz', _handle_xz)

help.txt

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -326,6 +326,7 @@ FUNCTIONS
326326

327327
* .bz2
328328
* .gz
329+
* .xz
329330
* .zst
330331

331332
The function depends on the file extension to determine the appropriate codec.
@@ -405,10 +406,10 @@ FUNCTIONS
405406
Parameters
406407
----------
407408
ext: str
408-
The extension. Must include the leading period, e.g. ``.gz``.
409+
The extension. Must include the leading period, e.g. `.gz`.
409410
callback: callable
410411
The callback. It must accept two position arguments, file_obj and mode.
411-
This function will be called when ``smart_open`` is opening a file with
412+
This function will be called when `smart_open` is opening a file with
412413
the specified extension.
413414

414415
Examples
@@ -419,10 +420,12 @@ FUNCTIONS
419420

420421
>>> def _handle_xz(file_obj, mode):
421422
... import lzma
422-
... return lzma.LZMAFile(filename=file_obj, mode=mode, format=lzma.FORMAT_XZ)
423+
... return lzma.LZMAFile(filename=file_obj, mode=mode)
423424
>>>
424425
>>> register_compressor('.xz', _handle_xz)
425426

427+
This is just an example: `lzma` is in the standard library and is registered by default.
428+
426429
s3_iter_bucket(bucket_name, prefix='', accept_key=None, key_limit=None, workers=16, retries=3, **session_kwargs)
427430
Deprecated. Use smart_open.s3.iter_bucket instead.
428431

release/release.sh

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,9 @@ git pull
2020
# Merge `develop` into `master` and push
2121
git merge develop --no-ff -m "Merge branch 'develop' into master"
2222
git push
23+
# Make sure you're on `develop` and you're up to date locally
24+
git checkout develop
25+
git pull
2326
# Merge `master` back into `develop` and push
2427
git merge master --no-ff -m "Merge branch 'master' into develop"
2528
git push

smart_open/compression.py

Lines changed: 31 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
# This code is distributed under the terms and conditions
66
# from the MIT License (MIT).
77
#
8-
"""Implements the compression layer of the ``smart_open`` library."""
8+
"""Implements the compression layer of the `smart_open` library."""
99
import io
1010
import logging
1111
import os.path
@@ -42,10 +42,10 @@ def register_compressor(ext, callback):
4242
Parameters
4343
----------
4444
ext: str
45-
The extension. Must include the leading period, e.g. ``.gz``.
45+
The extension. Must include the leading period, e.g. `.gz`.
4646
callback: callable
4747
The callback. It must accept two position arguments, file_obj and mode.
48-
This function will be called when ``smart_open`` is opening a file with
48+
This function will be called when `smart_open` is opening a file with
4949
the specified extension.
5050
5151
Examples
@@ -56,10 +56,12 @@ def register_compressor(ext, callback):
5656
5757
>>> def _handle_xz(file_obj, mode):
5858
... import lzma
59-
... return lzma.LZMAFile(filename=file_obj, mode=mode, format=lzma.FORMAT_XZ)
59+
... return lzma.LZMAFile(filename=file_obj, mode=mode)
6060
>>>
6161
>>> register_compressor('.xz', _handle_xz)
6262
63+
This is just an example: `lzma` is in the standard library and is registered by default.
64+
6365
"""
6466
if not (ext and ext[0] == '.'):
6567
raise ValueError('ext must be a string starting with ., not %r' % ext)
@@ -72,7 +74,7 @@ def register_compressor(ext, callback):
7274
def tweak_close(outer, inner):
7375
"""Ensure that closing the `outer` stream closes the `inner` stream as well.
7476
75-
Deprecated: smart_open.open().__exit__ now always calls __exit__ on the
77+
Deprecated: `smart_open.open().__exit__` now always calls `__exit__` on the
7678
underlying filestream.
7779
7880
Use this when your compression library's `close` method does not
@@ -94,33 +96,38 @@ def close_both(*args):
9496
outer.close = close_both
9597

9698

97-
def _handle_bz2(file_obj, mode):
98-
from bz2 import BZ2File
99-
result = BZ2File(file_obj, mode)
99+
def _maybe_wrap_buffered(file_obj, mode):
100+
# https://github.com/piskvorky/smart_open/issues/760#issuecomment-1553971657
101+
result = file_obj
102+
if "b" in mode and "w" in mode:
103+
result = io.BufferedWriter(result)
104+
elif "b" in mode and "r" in mode:
105+
result = io.BufferedReader(result)
100106
return result
101107

102108

109+
def _handle_bz2(file_obj, mode):
110+
import bz2
111+
result = bz2.open(filename=file_obj, mode=mode)
112+
return _maybe_wrap_buffered(result, mode)
113+
114+
103115
def _handle_gzip(file_obj, mode):
104116
import gzip
105-
result = gzip.GzipFile(fileobj=file_obj, mode=mode)
106-
return result
117+
result = gzip.open(filename=file_obj, mode=mode)
118+
return _maybe_wrap_buffered(result, mode)
107119

108120

109121
def _handle_zstd(file_obj, mode):
110-
import zstandard # type: ignore
122+
import zstandard
111123
result = zstandard.open(filename=file_obj, mode=mode)
112-
# zstandard.open returns an io.TextIOWrapper in text mode, but otherwise
113-
# returns a raw stream reader/writer, and we need the `io` wrapper
114-
# to make FileLikeProxy work correctly.
115-
#
116-
# See:
117-
#
118-
# https://github.com/indygreg/python-zstandard/blob/d7d81e79dbe74feb22fb73405ebfb3e20f4c4653/zstandard/__init__.py#L169-L174
119-
if "b" in mode and "w" in mode:
120-
result = io.BufferedWriter(result)
121-
elif "b" in mode and "r" in mode:
122-
result = io.BufferedReader(result)
123-
return result
124+
return _maybe_wrap_buffered(result, mode)
125+
126+
127+
def _handle_xz(file_obj, mode):
128+
import lzma
129+
result = lzma.open(filename=file_obj, mode=mode)
130+
return _maybe_wrap_buffered(result, mode)
124131

125132

126133
def compression_wrapper(file_obj, mode, compression=INFER_FROM_EXTENSION, filename=None):
@@ -165,3 +172,4 @@ def compression_wrapper(file_obj, mode, compression=INFER_FROM_EXTENSION, filena
165172
register_compressor('.bz2', _handle_bz2)
166173
register_compressor('.gz', _handle_gzip)
167174
register_compressor('.zst', _handle_zstd)
175+
register_compressor('.xz', _handle_xz)

tests/test_compression.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,10 @@
55
# This code is distributed under the terms and conditions
66
# from the MIT License (MIT).
77
#
8+
import bz2
89
import gzip
910
import io
11+
import lzma
1012

1113
import pytest
1214
import zstandard as zstd
@@ -37,6 +39,14 @@ def label(thing, name):
3739
(io.BytesIO(zstd.ZstdCompressor().compress(plain)), 'infer_from_extension', 'file.ZST'),
3840
(label(io.BytesIO(zstd.ZstdCompressor().compress(plain)), 'file.zst'), 'infer_from_extension', ''),
3941
(io.BytesIO(zstd.ZstdCompressor().compress(plain)), '.zst', 'file.zst'),
42+
(io.BytesIO(lzma.compress(plain)), 'infer_from_extension', 'file.xz'),
43+
(io.BytesIO(lzma.compress(plain)), 'infer_from_extension', 'file.XZ'),
44+
(label(io.BytesIO(lzma.compress(plain)), 'file.xz'), 'infer_from_extension', ''),
45+
(io.BytesIO(lzma.compress(plain)), '.xz', 'file.xz'),
46+
(io.BytesIO(bz2.compress(plain)), 'infer_from_extension', 'file.bz2'),
47+
(io.BytesIO(bz2.compress(plain)), 'infer_from_extension', 'file.BZ2'),
48+
(label(io.BytesIO(bz2.compress(plain)), 'file.bz2'), 'infer_from_extension', ''),
49+
(io.BytesIO(bz2.compress(plain)), '.bz2', 'file.bz2'),
4050
]
4151
)
4252
def test_compression_wrapper_read(fileobj, compression, filename):

tests/test_smart_open.py

Lines changed: 25 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -78,28 +78,37 @@ def named_temporary_file(mode='w+b', prefix=None, suffix=None, delete=True):
7878
logger.error(e)
7979

8080

81-
def test_zst_write():
82-
with named_temporary_file(suffix=".zst") as tmp:
83-
with smart_open.open(tmp.name, "wt") as fout:
84-
print("hello world", file=fout)
85-
print("this is a test", file=fout)
81+
def test_compression_extensions():
82+
for extension in smart_open.compression.get_supported_extensions():
83+
with named_temporary_file(suffix=extension) as tmp:
84+
with smart_open.open(tmp.name, "wt") as fout:
85+
print("hello world", file=fout)
86+
print("this is a test", file=fout)
8687

87-
with smart_open.open(tmp.name, "rt") as fin:
88-
got = list(fin)
88+
with smart_open.open(tmp.name, "rt") as fin:
89+
got = list(fin)
8990

90-
assert got == ["hello world\n", "this is a test\n"]
91+
assert got == ["hello world\n", "this is a test\n"], f"Error for {extension=}, mode='wt'"
9192

93+
with named_temporary_file(suffix=extension) as tmp:
94+
with smart_open.open(tmp.name, "w") as fout:
95+
fout.write("hello world\n")
96+
fout.write("this is a test\n")
9297

93-
def test_zst_write_binary():
94-
with named_temporary_file(suffix=".zst") as tmp:
95-
with smart_open.open(tmp.name, "wb") as fout:
96-
fout.write(b"hello world\n")
97-
fout.write(b"this is a test\n")
98+
with smart_open.open(tmp.name, "r") as fin:
99+
got = list(fin)
98100

99-
with smart_open.open(tmp.name, "rb") as fin:
100-
got = list(fin)
101+
assert got == ["hello world\n", "this is a test\n"], f"Error for {extension=}, mode='w'"
101102

102-
assert got == [b"hello world\n", b"this is a test\n"]
103+
with named_temporary_file(suffix=extension) as tmp:
104+
with smart_open.open(tmp.name, "wb") as fout:
105+
fout.write(b"hello world\n")
106+
fout.write(b"this is a test\n")
107+
108+
with smart_open.open(tmp.name, "rb") as fin:
109+
got = list(fin)
110+
111+
assert got == [b"hello world\n", b"this is a test\n"], f"Error for {extension=}, mode='wb'"
103112

104113

105114
class ParseUriTest(unittest.TestCase):

0 commit comments

Comments
 (0)