Skip to content

Commit 2b110a5

Browse files
ds-filipknefelFilip Knefel
andauthored
fix: Generalize tar decompression (#88)
Change tarfile open mode from "r:gz" assuming that file is gzip compressed (which is not true for .tar files) to more general "r:*". --------- Co-authored-by: Filip Knefel <[email protected]>
1 parent cfd4b52 commit 2b110a5

File tree

3 files changed

+13
-3
lines changed

3 files changed

+13
-3
lines changed

CHANGELOG.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,12 @@
1+
## 0.0.10
2+
3+
### Enhancements
4+
5+
* "Fix tar extraction" - tar extraction function assumed archive was gzip compressed which isn't true for supported `.tar` archives. Updated to work for both compressed and uncompressed tar archives.
6+
7+
### Fixes
8+
9+
110
## 0.0.9
211

312
### Enhancements
@@ -10,7 +19,7 @@
1019
### Fixes
1120

1221
**Fix uncompress logic** Use of the uncompress process wasn't being leveraged in the pipeline correctly. Updated to use the new loca download path for where the partitioned looks for the new file.
13-
>>>>>>> d7a2cab (Add entry to changelog)
22+
1423

1524
## 0.0.8
1625

unstructured_ingest/__version__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.0.9" # pragma: no cover
1+
__version__ = "0.0.10" # pragma: no cover

unstructured_ingest/utils/compression.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,8 @@ def uncompress_tar_file(tar_filename: str, path: Optional[str] = None) -> str:
6363

6464
path = path if path else os.path.join(head, f"{tail}-tar-uncompressed")
6565
logger.info(f"extracting tar {tar_filename} -> {path}")
66-
with tarfile.open(tar_filename, "r:gz") as tfile:
66+
# NOTE: "r:*" mode opens both compressed (e.g ".tar.gz") and uncompressed ".tar" archives
67+
with tarfile.open(tar_filename, "r:*") as tfile:
6768
# NOTE(robinson: Mitigate against malicious content being extracted from the tar file.
6869
# This was added in Python 3.12
6970
# Ref: https://docs.python.org/3/library/tarfile.html#extraction-filters

0 commit comments

Comments
 (0)