Skip to content

Commit 3fe7e1b

Browse files
authored
fix: pdf2image library is core requirement (#745)
1 parent 8258dbb commit 3fe7e1b

10 files changed

+24
-18
lines changed

Diff for: CHANGELOG.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,19 @@
1-
## 0.7.5-dev2
1+
## 0.7.5
22

33
### Enhancements
44

55
* Adds functionality to sort elements in `partition_pdf` for `fast` strategy
66
* Adds ingest tests with `--fast` strategy on PDF documents
77
* Adds --api-key to unstructured-ingest
88

9-
109
### Features
1110

1211
* Adds `partition_rst` for processed ReStructured Text documents.
1312

1413
### Fixes
1514

1615
* Adds handling for emails that do not have a datetime to extract.
16+
* Adds pdf2image package as core requirement of unstructured (with no extras)
1717

1818
## 0.7.4
1919

Diff for: requirements/base.in

+1
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ msg_parser
77
nltk
88
openpyxl
99
pandas
10+
pdf2image
1011
pdfminer.six
1112
pillow
1213
pypandoc

Diff for: requirements/base.txt

+3
Original file line numberDiff line numberDiff line change
@@ -82,11 +82,14 @@ pandas==1.5.3
8282
# via
8383
# -r requirements/base.in
8484
# argilla
85+
pdf2image==1.16.3
86+
# via -r requirements/base.in
8587
pdfminer-six==20221105
8688
# via -r requirements/base.in
8789
pillow==9.5.0
8890
# via
8991
# -r requirements/base.in
92+
# pdf2image
9093
# python-pptx
9194
pycparser==2.21
9295
# via cffi

Diff for: requirements/dev.txt

+4-4
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ executing==1.2.0
6363
# via stack-data
6464
fastjsonschema==2.17.1
6565
# via nbformat
66-
filelock==3.12.1
66+
filelock==3.12.2
6767
# via virtualenv
6868
fqdn==1.5.1
6969
# via jsonschema
@@ -82,7 +82,7 @@ importlib-metadata==6.6.0
8282
# nbconvert
8383
importlib-resources==5.12.0
8484
# via jsonschema
85-
ipykernel==6.23.1
85+
ipykernel==6.23.2
8686
# via
8787
# ipywidgets
8888
# jupyter
@@ -171,7 +171,7 @@ nbclassic==1.0.0
171171
# via notebook
172172
nbclient==0.8.0
173173
# via nbconvert
174-
nbconvert==7.4.0
174+
nbconvert==7.5.0
175175
# via
176176
# jupyter
177177
# jupyter-server
@@ -224,7 +224,7 @@ platformdirs==3.5.3
224224
# -c requirements/test.txt
225225
# jupyter-core
226226
# virtualenv
227-
pre-commit==3.3.2
227+
pre-commit==3.3.3
228228
# via -r requirements/dev.in
229229
prometheus-client==0.17.0
230230
# via

Diff for: requirements/huggingface.txt

+2-2
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ click==8.1.3
1717
# via
1818
# -c requirements/base.txt
1919
# sacremoses
20-
filelock==3.12.1
20+
filelock==3.12.2
2121
# via
2222
# huggingface-hub
2323
# torch
@@ -90,7 +90,7 @@ tqdm==4.65.0
9090
# huggingface-hub
9191
# sacremoses
9292
# transformers
93-
transformers==4.30.1
93+
transformers==4.30.2
9494
# via -r requirements/huggingface.in
9595
typing-extensions==4.6.3
9696
# via

Diff for: requirements/ingest-azure.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ async-timeout==4.0.2
1414
# via aiohttp
1515
attrs==23.1.0
1616
# via aiohttp
17-
azure-core==1.27.0
17+
azure-core==1.27.1
1818
# via
1919
# adlfs
2020
# azure-identity

Diff for: requirements/ingest-discord.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ charset-normalizer==3.1.0
1616
# via
1717
# -c requirements/base.txt
1818
# aiohttp
19-
discord-py==2.2.3
19+
discord-py==2.3.0
2020
# via -r requirements/ingest-discord.in
2121
frozenlist==1.3.3
2222
# via

Diff for: requirements/ingest-google-drive.txt

+3-3
Original file line numberDiff line numberDiff line change
@@ -17,16 +17,16 @@ charset-normalizer==3.1.0
1717
# requests
1818
google-api-core==2.11.0
1919
# via google-api-python-client
20-
google-api-python-client==2.88.0
20+
google-api-python-client==2.89.0
2121
# via -r requirements/ingest-google-drive.in
22-
google-auth==2.19.1
22+
google-auth==2.20.0
2323
# via
2424
# google-api-core
2525
# google-api-python-client
2626
# google-auth-httplib2
2727
google-auth-httplib2==0.1.0
2828
# via google-api-python-client
29-
googleapis-common-protos==1.59.0
29+
googleapis-common-protos==1.59.1
3030
# via google-api-core
3131
httplib2==0.22.0
3232
# via

Diff for: requirements/local-inference.txt

+6-4
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ charset-normalizer==3.1.0
2222
# requests
2323
coloredlogs==15.0.1
2424
# via onnxruntime
25-
contourpy==1.0.7
25+
contourpy==1.1.0
2626
# via matplotlib
2727
cryptography==41.0.1
2828
# via
@@ -32,7 +32,7 @@ cycler==0.11.0
3232
# via matplotlib
3333
effdet==0.4.1
3434
# via layoutparser
35-
filelock==3.12.1
35+
filelock==3.12.2
3636
# via
3737
# huggingface-hub
3838
# torch
@@ -106,7 +106,9 @@ pandas==1.5.3
106106
# -c requirements/base.txt
107107
# layoutparser
108108
pdf2image==1.16.3
109-
# via layoutparser
109+
# via
110+
# -c requirements/base.txt
111+
# layoutparser
110112
pdfminer-six==20221105
111113
# via
112114
# -c requirements/base.txt
@@ -201,7 +203,7 @@ tqdm==4.65.0
201203
# huggingface-hub
202204
# iopath
203205
# transformers
204-
transformers==4.30.1
206+
transformers==4.30.2
205207
# via unstructured-inference
206208
typing-extensions==4.6.3
207209
# via

Diff for: unstructured/__version__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.7.5-dev2" # pragma: no cover
1+
__version__ = "0.7.5" # pragma: no cover

0 commit comments

Comments
 (0)