Skip to content

The file upload failed. #14

@zhouyujn

Description

@zhouyujn

@aymenfurter
The file upload encountered an HTTP 500 error.
I wonder if the file hasn’t passed through Document Intelligence? The PDF files are fine, but Word or other file formats encounter errors.

error500

2024-10-21T02:43:41.099251851Z INFO:geventwebsocket.handler:100.100.0.115 - - [2024-10-21 02:43:41] "GET /indexes/espp/files?is_restricted=false HTTP/1.1" 200 171 0.007272
2024-10-21T02:43:41.376999492Z INFO:geventwebsocket.handler:100.100.0.115 - - [2024-10-21 02:43:41] "GET /indexes HTTP/1.1" 200 169 0.045113
2024-10-21T02:43:42.110639426Z INFO:azure.core.pipeline.policies.http_logging_policy:Request URL: 'https://strwxmbueydoikkg.queue.core.windows.net/indexing/messages?numofmessages=REDACTED&visibilitytimeout=REDACTED'
2024-10-21T02:43:42.110684370Z Request method: 'GET'
2024-10-21T02:43:42.110695080Z Request headers:
2024-10-21T02:43:42.110703155Z 'x-ms-version': 'REDACTED'
2024-10-21T02:43:42.110710829Z 'Accept': 'application/xml'
2024-10-21T02:43:42.110718764Z 'User-Agent': 'azsdk-python-storage-queue/12.11.0 Python/3.11.10 (Linux-5.15.164.1-1.cm2-x86_64-with-glibc2.36)'
2024-10-21T02:43:42.110726770Z 'x-ms-date': 'REDACTED'
2024-10-21T02:43:42.110734664Z 'x-ms-client-request-id': '48ae17ca-8f56-11ef-8667-3e4e57cc0722'
2024-10-21T02:43:42.110741838Z 'Authorization': 'REDACTED'
2024-10-21T02:43:42.110749071Z No body was attached to the request
2024-10-21T02:43:42.115860276Z INFO:azure.core.pipeline.policies.http_logging_policy:Response status: 200
2024-10-21T02:43:42.115884081Z Response headers:
2024-10-21T02:43:42.115894380Z 'Cache-Control': 'no-cache'
2024-10-21T02:43:42.115903427Z 'Transfer-Encoding': 'chunked'
2024-10-21T02:43:42.115911342Z 'Content-Type': 'application/xml'
2024-10-21T02:43:42.115918606Z 'Server': 'Windows-Azure-Queue/1.0 Microsoft-HTTPAPI/2.0'
2024-10-21T02:43:42.115925689Z 'x-ms-request-id': 'dec4a70e-2003-0054-0263-23f559000000'
2024-10-21T02:43:42.115933443Z 'x-ms-client-request-id': '48ae17ca-8f56-11ef-8667-3e4e57cc0722'
2024-10-21T02:43:42.115940537Z 'x-ms-version': 'REDACTED'
2024-10-21T02:43:42.115947640Z 'Date': 'Mon, 21 Oct 2024 02:43:41 GMT'
2024-10-21T02:43:45.406998738Z ERROR:root:Error getting PDF page count: EOF marker not found
2024-10-21T02:43:45.407628523Z ERROR:main:Exception on /indexes/espp/upload [POST]
2024-10-21T02:43:45.407666334Z Traceback (most recent call last):
2024-10-21T02:43:45.407677155Z File "/usr/local/lib/python3.11/site-packages/flask/app.py", line 1473, in wsgi_app
2024-10-21T02:43:45.407685611Z response = self.full_dispatch_request()
2024-10-21T02:43:45.407693956Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-10-21T02:43:45.407702672Z File "/usr/local/lib/python3.11/site-packages/flask/app.py", line 882, in full_dispatch_request
2024-10-21T02:43:45.407711248Z rv = self.handle_user_exception(e)
2024-10-21T02:43:45.407718973Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-10-21T02:43:45.407726527Z File "/usr/local/lib/python3.11/site-packages/flask_cors/extension.py", line 178, in wrapped_function
2024-10-21T02:43:45.407734732Z return cors_after_request(app.make_response(f(*args, **kwargs)))
2024-10-21T02:43:45.407741976Z ^^^^^^^^^^^^^^^^^^
2024-10-21T02:43:45.407749510Z File "/usr/local/lib/python3.11/site-packages/flask/app.py", line 880, in full_dispatch_request
2024-10-21T02:43:45.407757455Z rv = self.dispatch_request()
2024-10-21T02:43:45.407765060Z ^^^^^^^^^^^^^^^^^^^^^^^
2024-10-21T02:43:45.407773124Z File "/usr/local/lib/python3.11/site-packages/flask/app.py", line 865, in dispatch_request
2024-10-21T02:43:45.407780558Z return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return]
2024-10-21T02:43:45.407802430Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-10-21T02:43:45.407810214Z File "/app/app/api/routes.py", line 213, in _upload_file
2024-10-21T02:43:45.407817638Z num_pages = get_pdf_page_count(file_buffer)
2024-10-21T02:43:45.407825593Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-10-21T02:43:45.407833478Z File "/app/app/ingestion/pdf_processing.py", line 24, in get_pdf_page_count
2024-10-21T02:43:45.407841393Z reader = PdfReader(pdf_bytes)
2024-10-21T02:43:45.407849067Z ^^^^^^^^^^^^^^^^^^^^
2024-10-21T02:43:45.407857022Z File "/usr/local/lib/python3.11/site-packages/PyPDF2/_reader.py", line 319, in init
2024-10-21T02:43:45.407864606Z self.read(stream)
2024-10-21T02:43:45.407872281Z File "/usr/local/lib/python3.11/site-packages/PyPDF2/_reader.py", line 1415, in read
2024-10-21T02:43:45.407879564Z self._find_eof_marker(stream)
2024-10-21T02:43:45.407887118Z File "/usr/local/lib/python3.11/site-packages/PyPDF2/_reader.py", line 1471, in _find_eof_marker
2024-10-21T02:43:45.407894743Z raise PdfReadError("EOF marker not found")
2024-10-21T02:43:45.407902166Z PyPDF2.errors.PdfReadError: EOF marker not found

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions