Skip to content

Commit 297609f

Browse files
benjats07qued
andauthored
feat: integration of YoloX for layout detection
* feat: added YoloX model for layout detection, including tests for images and PDF inference * docs: updated README with force_ocr and deleted comment inside test --------- Co-authored-by: Alan Bertl <[email protected]>
1 parent 5d84859 commit 297609f

File tree

20 files changed

+909
-17
lines changed

20 files changed

+909
-17
lines changed

.github/workflows/ci.yml

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,13 +24,17 @@ jobs:
2424
uses: actions/setup-python@v4
2525
with:
2626
python-version: ${{ env.PYTHON_VERSION }}
27+
- name: Install Poppler
28+
run: |
29+
sudo apt-get update
30+
sudo apt-get -y install poppler-utils
2731
- name: Setup virtual environment (no cache hit)
2832
if: steps.virtualenv-cache.outputs.cache-hit != 'true'
2933
run: |
3034
python${{ env.PYTHON_VERSION }} -m venv .venv
3135
source .venv/bin/activate
3236
make install-ci
33-
37+
3438
lint:
3539
runs-on: ubuntu-latest
3640
needs: setup
@@ -80,6 +84,10 @@ jobs:
8084
python${{ env.PYTHON_VERSION }} -m venv .venv
8185
source .venv/bin/activate
8286
make install-ci
87+
- name: Install Poppler
88+
run: |
89+
sudo apt-get update
90+
sudo apt-get -y install poppler-utils tesseract-ocr
8391
- name: Test
8492
run: |
8593
source .venv/bin/activate

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
## 0.2.5
2+
3+
* Add YoloX model for images and PDFs
4+
15
## 0.2.5-dev0
26

37
* Add generic model interface

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ install-base-pip-packages:
2929

3030
.PHONY: install-detectron2
3131
install-detectron2:
32-
pip install "detectron2@git+https://github.com/facebookresearch/detectron2.git@v0.6#egg=detectron2"
32+
pip install "detectron2@git+https://github.com/facebookresearch/detectron2.git@78d5b4f335005091fe0364ce4775d711ec93566e"
3333

3434
.PHONY: install-test
3535
install-test:

README.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,40 @@ If you are using an Apple M1 chip, use `make run-app-dev` instead of `make start
7878
start the API with hot reloading. The API will run at `http:/localhost:8000`.
7979

8080
View the swagger documentation at `http://localhost:5000/docs`.
81+
82+
## YoloX model
83+
84+
For using the YoloX model the endpoints are:
85+
```
86+
http://localhost:8000/layout_v1/pdf
87+
http://localhost:8000/layout_v1/image
88+
```
89+
For example:
90+
```
91+
curl -X 'POST' 'http://localhost:8000/layout/yolox/image' \
92+
-F 'file=@sample-docs/test-image.jpg' \
93+
| jq -C | less -R
94+
95+
curl -X 'POST' 'http://localhost:8000/layout/yolox/pdf' \
96+
-F 'file=@sample-docs/loremipsum.pdf' \
97+
| jq -C | less -R
98+
```
99+
100+
If your PDF file doesn't have text embedded you can force the use of OCR with
101+
the parameter force_ocr=True:
102+
```
103+
curl -X 'POST' 'http://localhost:8000/layout/yolox/pdf' \
104+
-F 'file=@sample-docs/loremipsum.pdf' \
105+
-F force_ocr=true
106+
| jq -C | less -R
107+
```
108+
109+
or in local:
110+
111+
```
112+
layout = yolox_local_inference(filename, type="pdf")
113+
```
114+
81115
## Security Policy
82116

83117
See our [security policy](https://github.com/Unstructured-IO/unstructured-inference/security/policy) for

requirements/base.txt

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@ charset-normalizer==3.0.1
1818
# requests
1919
click==8.1.3
2020
# via uvicorn
21+
coloredlogs==15.0.1
22+
# via onnxruntime
2123
contourpy==1.0.7
2224
# via matplotlib
2325
cryptography==39.0.0
@@ -30,6 +32,8 @@ fastapi==0.89.1
3032
# via unstructured-inference (setup.py)
3133
filelock==3.9.0
3234
# via huggingface-hub
35+
flatbuffers==23.1.21
36+
# via onnxruntime
3337
fonttools==4.38.0
3438
# via matplotlib
3539
h11==0.14.0
@@ -38,30 +42,39 @@ huggingface-hub==0.12.0
3842
# via
3943
# timm
4044
# unstructured-inference (setup.py)
45+
humanfriendly==10.0
46+
# via coloredlogs
4147
idna==3.4
4248
# via
4349
# anyio
4450
# requests
4551
iopath==0.1.10
4652
# via layoutparser
53+
jsons==1.6.3
54+
# via unstructured-inference (setup.py)
4755
kiwisolver==1.4.4
4856
# via matplotlib
4957
layoutparser[layoutmodels,tesseract]==0.3.4
5058
# via unstructured-inference (setup.py)
5159
matplotlib==3.6.3
5260
# via pycocotools
61+
mpmath==1.2.1
62+
# via sympy
5363
numpy==1.24.1
5464
# via
5565
# contourpy
5666
# layoutparser
5767
# matplotlib
68+
# onnxruntime
5869
# opencv-python
5970
# pandas
6071
# pycocotools
6172
# scipy
6273
# torchvision
6374
omegaconf==2.3.0
6475
# via effdet
76+
onnxruntime==1.13.1
77+
# via unstructured-inference (setup.py)
6578
opencv-python==4.6.0.66
6679
# via
6780
# layoutparser
@@ -70,6 +83,7 @@ packaging==23.0
7083
# via
7184
# huggingface-hub
7285
# matplotlib
86+
# onnxruntime
7387
# pytesseract
7488
pandas==1.5.3
7589
# via layoutparser
@@ -89,6 +103,8 @@ pillow==9.4.0
89103
# torchvision
90104
portalocker==2.7.0
91105
# via iopath
106+
protobuf==4.21.12
107+
# via onnxruntime
92108
pycocotools==2.0.6
93109
# via effdet
94110
pycparser==2.21
@@ -127,6 +143,8 @@ sniffio==1.3.0
127143
# via anyio
128144
starlette==0.22.0
129145
# via fastapi
146+
sympy==1.11.1
147+
# via onnxruntime
130148
timm==0.6.12
131149
# via effdet
132150
torch==1.13.1
@@ -152,6 +170,8 @@ typing-extensions==4.4.0
152170
# starlette
153171
# torch
154172
# torchvision
173+
typish==1.9.3
174+
# via jsons
155175
urllib3==1.26.14
156176
# via requests
157177
uvicorn==0.20.0

requirements/dev.txt

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ attrs==22.2.0
2525
# via jsonschema
2626
backcall==0.2.0
2727
# via ipython
28-
beautifulsoup4==4.11.1
28+
beautifulsoup4==4.11.2
2929
# via nbconvert
3030
bleach==6.0.0
3131
# via nbconvert
@@ -59,7 +59,7 @@ importlib-metadata==6.0.0
5959
# nbconvert
6060
importlib-resources==5.10.2
6161
# via jsonschema
62-
ipykernel==6.21.0
62+
ipykernel==6.20.2
6363
# via
6464
# ipywidgets
6565
# jupyter
@@ -111,7 +111,6 @@ jupyter-console==6.4.4
111111
# via jupyter
112112
jupyter-core==5.2.0
113113
# via
114-
# ipykernel
115114
# jupyter-client
116115
# jupyter-server
117116
# nbclassic
@@ -161,6 +160,7 @@ nbformat==5.7.3
161160
# notebook
162161
nest-asyncio==1.5.6
163162
# via
163+
# ipykernel
164164
# nbclassic
165165
# notebook
166166
notebook==6.5.2
@@ -182,7 +182,7 @@ pexpect==4.8.0
182182
# via ipython
183183
pickleshare==0.7.5
184184
# via ipython
185-
pip-tools==6.12.1
185+
pip-tools==6.12.2
186186
# via -r requirements/dev.in
187187
pkgutil-resolve-name==1.3.10
188188
# via jsonschema

requirements/test.in

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,5 +10,7 @@ httpx
1010
flake8
1111
mypy
1212
pytest-cov
13+
pdf2image>=1.16.2
14+
huggingface_hub>=0.11.1
1315
label_studio_sdk
1416
vcrpy

requirements/test.txt

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,8 @@ coverage[toml]==7.1.0
2727
# pytest-cov
2828
exceptiongroup==1.1.0
2929
# via pytest
30+
filelock==3.9.0
31+
# via huggingface-hub
3032
flake8==6.0.0
3133
# via -r requirements/test.in
3234
h11==0.14.0
@@ -35,6 +37,8 @@ httpcore==0.16.3
3537
# via httpx
3638
httpx==0.23.3
3739
# via -r requirements/test.in
40+
huggingface-hub==0.12.0
41+
# via -r requirements/test.in
3842
idna==3.4
3943
# via
4044
# anyio
@@ -58,9 +62,15 @@ mypy-extensions==0.4.3
5862
# black
5963
# mypy
6064
packaging==23.0
61-
# via pytest
65+
# via
66+
# huggingface-hub
67+
# pytest
6268
pathspec==0.11.0
6369
# via black
70+
pdf2image==1.16.2
71+
# via -r requirements/test.in
72+
pillow==9.4.0
73+
# via pdf2image
6474
platformdirs==2.6.2
6575
# via black
6676
pluggy==1.0.0
@@ -76,9 +86,13 @@ pytest==7.2.1
7686
pytest-cov==4.0.0
7787
# via -r requirements/test.in
7888
pyyaml==6.0
79-
# via vcrpy
89+
# via
90+
# huggingface-hub
91+
# vcrpy
8092
requests==2.28.2
81-
# via label-studio-sdk
93+
# via
94+
# huggingface-hub
95+
# label-studio-sdk
8296
rfc3986[idna2008]==1.5.0
8397
# via httpx
8498
six==1.16.0
@@ -94,9 +108,12 @@ tomli==2.0.1
94108
# coverage
95109
# mypy
96110
# pytest
111+
tqdm==4.64.1
112+
# via huggingface-hub
97113
typing-extensions==4.4.0
98114
# via
99115
# black
116+
# huggingface-hub
100117
# mypy
101118
# pydantic
102119
urllib3==1.26.14

sample-docs/empty-document.pdf

3.91 KB
Binary file not shown.

sample-docs/non-embedded.pdf

445 KB
Binary file not shown.

0 commit comments

Comments
 (0)