Skip to content

Commit 9b778e2

Browse files
fix: pytesseract>=0.3.12 installation error while installing pdf extra (#3522)
Closes #3521. This PR resolves an installation error with `pytesseract>=0.3.12` that occurred during `pip install unstructured[pdf]==0.15.3`. ### Testing **Run following command in main branch and this PR** ``` pip uninstall -y pytesseract && pip install ".[pdf]" ``` **Results** - `main` branch ``` INFO: pip is looking at multiple versions of unstructured[pdf] to determine which version is compatible with other requirements. This could take a while. ERROR: Could not find a version that satisfies the requirement pytesseract>=0.3.12; extra == "pdf" (from unstructured[pdf]) (from versions: 0.1, 0.1.3, 0.1.4, 0.1.5, 0.1.6, 0.1.7, 0.1.8, 0.1.9, 0.2.0, 0.2.2, 0.2.4, 0.2.5, 0.2.6, 0.2.7, 0.2.8, 0.2.9, 0.3.0, 0.3.1, 0.3.2, 0.3.3, 0.3.4, 0.3.5, 0.3.6, 0.3.7, 0.3.8, 0.3.9, 0.3.10) ERROR: No matching distribution found for pytesseract>=0.3.12; extra == "pdf" ``` - this `PR` `pytesseract-0.3.13` should be installed successfully.
1 parent d6a84bd commit 9b778e2

File tree

5 files changed

+16
-7
lines changed

5 files changed

+16
-7
lines changed

CHANGELOG.md

+10
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,13 @@
1+
## 0.15.4
2+
3+
### Enhancements
4+
5+
### Features
6+
7+
### Fixes
8+
9+
* **Resolve an installation error with `pytesseract>=0.3.12` that occurred during `pip install unstructured[pdf]==0.15.3`.**
10+
111
## 0.15.3
212

313
### Enhancements

requirements/deps/constraints.txt

+1-2
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,7 @@ Office365-REST-Python-Client<2.4.3
2222
# unstructured-inference to be upgraded when unstructured library is upgraded
2323
# https://github.com/Unstructured-IO/unstructured/issues/1458
2424
# unstructured-inference
25-
# use the known compatible version of weaviate and pytesseract
26-
pytesseract @ git+https://github.com/madmaze/[email protected]
25+
# use the known compatible version of weaviate
2726
weaviate-client>3.25.0
2827
# TODO: Pinned in transformers package, remove when that gets updated
2928
tokenizers>=0.19,<0.20

requirements/extra-pdf-image.in

+3-1
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,6 @@ effdet
1212
# Do not move to constraints.in, otherwise unstructured-inference will not be upgraded
1313
# when unstructured library is.
1414
unstructured-inference==0.7.36
15-
pytesseract>=0.3.12
15+
# NOTE(christine): Pinned to a specific version of pytesseract from the GitHub repository.
16+
# Remove this pin and switch to the latest version from PyPI once version 0.3.13 or newer is officially released.
17+
pytesseract @ git+https://github.com/madmaze/[email protected]

requirements/extra-pdf-image.txt

+1-3
Original file line numberDiff line numberDiff line change
@@ -202,9 +202,7 @@ pypdf==4.3.1
202202
pypdfium2==4.30.0
203203
# via pdfplumber
204204
pytesseract @ git+https://github.com/madmaze/[email protected]
205-
# via
206-
# -c ././deps/constraints.txt
207-
# -r ./extra-pdf-image.in
205+
# via -r ./extra-pdf-image.in
208206
python-dateutil==2.9.0.post0
209207
# via
210208
# -c ./base.txt

unstructured/__version__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.15.3" # pragma: no cover
1+
__version__ = "0.15.4" # pragma: no cover

0 commit comments

Comments
 (0)