Skip to content

Commit 8d41def

Browse files
authored
fix: fix occasional UnboundLocalError (#199)
To verify, you need to run parallel mode and send a pdf file with the correct mimetype but a wrong file extension. On main, start the server in parallel mode and confirm the bug happens: ``` export UNSTRUCTURED_PARALLEL_MODE_ENABLED=true export UNSTRUCTURED_PARALLEL_MODE_URL=http://localhost:8000/general/v0/general make run-web-app ``` In a python shell: ``` import requests with open("sample-docs/layout-parser-paper.pdf", "rb") as f: res = requests.post("http://localhost:8000/general/v0/general", files={"files": ("foo.txt", f, "application/pdf")}) # Should be Internal server error print(res.text) ``` Then try again on this branch. Also, removed logging of the request object, which just displays an object reference next to the params.
1 parent d471949 commit 8d41def

File tree

3 files changed

+4
-5
lines changed

3 files changed

+4
-5
lines changed

Diff for: CHANGELOG.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
1-
## 0.0.38-dev0
1+
## 0.0.38-dev1
22

33
* Fix page break has None page number bug
4+
* Fix a UnboundLocalError using pdfs in parallel mode
45

56
## 0.0.37
67

Diff for: pipeline-notebooks/pipeline-general.ipynb

+1-2
Original file line numberDiff line numberDiff line change
@@ -753,7 +753,6 @@
753753
"):\n",
754754
" logger.debug(\"pipeline_api input params: {}\".format(\n",
755755
" json.dumps({\n",
756-
" \"request\": request,\n",
757756
" \"filename\": filename,\n",
758757
" \"file_content_type\": file_content_type,\n",
759758
" \"response_type\": response_type,\n",
@@ -773,7 +772,7 @@
773772
" # since fast api might sent the wrong one.\n",
774773
" file_content_type = \"application/x-ole-storage\"\n",
775774
" \n",
776-
" if filename.endswith(\".pdf\"):\n",
775+
" if file_content_type == \"application/pdf\":\n",
777776
" try: \n",
778777
" pdf = PdfReader(file)\n",
779778
" except pypdf.errors.EmptyFileError:\n",

Diff for: prepline_general/api/general.py

+1-2
Original file line numberDiff line numberDiff line change
@@ -216,7 +216,6 @@ def pipeline_api(
216216
"pipeline_api input params: {}".format(
217217
json.dumps(
218218
{
219-
"request": request,
220219
"filename": filename,
221220
"file_content_type": file_content_type,
222221
"response_type": response_type,
@@ -239,7 +238,7 @@ def pipeline_api(
239238
# since fast api might sent the wrong one.
240239
file_content_type = "application/x-ole-storage"
241240

242-
if filename.endswith(".pdf"):
241+
if file_content_type == "application/pdf":
243242
try:
244243
pdf = PdfReader(file)
245244
except pypdf.errors.EmptyFileError:

0 commit comments

Comments
 (0)