Description
Here's the GitHub issue formatted as requested:
Describe the bug
When attempting to use the UnstructuredClient
to parse a PDF document, a ValueError
is thrown due to an incompatibility with uvloop
. This occurs when initializing the SplitPdfHook
in the UnstructuredClient
. The error suggests that nest_asyncio
is unable to patch the uvloop.Loop
.
The version that I am using.
unstructured==0.15.1
unstructured-client==0.23.9
To Reproduce
from unstructured_client import UnstructuredClient
from langchain_community.document_loaders import UnstructuredAPIFileLoader
client = UnstructuredClient()
loader = UnstructuredAPIFileLoader(
file_path="path/to/your/document.pdf",
api_key="your-api-key",
api_url="your-api-url"
)
# This line triggers the error
documents = loader.load_and_split()
Expected behavior
The UnstructuredClient
should initialize successfully and be able to parse the PDF document without throwing a ValueError
related to uvloop
.
Environment Info
Please run `python scripts/collect_env.py` and paste the output here.
This will help us understand more about the environment in which the bug occurred.
Note: As I don't have access to run this script, please run it in your environment and paste the output here.
Additional context
- Python version: 3.11
- Using uvloop: Yes
- The error occurs in an asynchronous context, possibly within a FastAPI application
- The full error traceback suggests this is happening within a larger application (possibly named "pylon")
- The error specifically mentions:
This indicates that the
File "/usr/local/lib/python3.11/site-packages/unstructured_client/_hooks/custom/split_pdf_hook.py", line 73, in __init__ nest_asyncio.apply()
SplitPdfHook
is trying to applynest_asyncio
, which is incompatible withuvloop
.
Traceback
raceback (most recent call last):
File "/app/pylon/core/document/unstructured.py", line 30, in parse_document_with_unstructuredio
).load_and_split()
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/langchain_core/document_loaders/base.py", line 64, in load_and_split
docs = self.load()
^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/langchain_core/document_loaders/base.py", line 30, in load
return list(self.lazy_load())
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/unstructured.py", line 107, in lazy_load
elements = self._get_elements()
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/unstructured.py", line 333, in _get_elements
return get_elements_from_api(
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/unstructured.py", line 261, in get_elements_from_api
return partition_via_api(
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/unstructured/partition/api.py", line 69, in partition_via_api
sdk = UnstructuredClient(api_key_auth=api_key, server_url=base_url)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/unstructured_client/sdk.py", line 54, in __init__
self.sdk_configuration = SDKConfiguration(
^^^^^^^^^^^^^^^^^
File "<string>", line 13, in __init__
File "/usr/local/lib/python3.11/site-packages/unstructured_client/sdkconfiguration.py", line 38, in __post_init__
self._hooks = SDKHooks()
^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/unstructured_client/_hooks/sdkhooks.py", line 15, in __init__
init_hooks(self)
File "/usr/local/lib/python3.11/site-packages/unstructured_client/_hooks/registration.py", line 28, in init_hooks
split_pdf_hook = SplitPdfHook()
^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/unstructured_client/_hooks/custom/split_pdf_hook.py", line 73, in __init__
nest_asyncio.apply()
File "/usr/local/lib/python3.11/site-packages/nest_asyncio.py", line 18, in apply
loop = loop or asyncio.get_event_loop()
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/nest_asyncio.py", line 40, in _get_event_loop
loop = events.get_event_loop_policy().get_event_loop()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/nest_asyncio.py", line 67, in get_event_loop
_patch_loop(loop)
File "/usr/local/lib/python3.11/site-packages/nest_asyncio.py", line 193, in _patch_loop
raise ValueError('Can\'t patch loop of type %s' % type(loop))
ValueError: Can't patch loop of type <class 'uvloop.Loop'>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/app/pylon/routers/document.py", line 43, in create_documents_process
document_info: DocumentsInfo = save_document(document, request.agent, request.organize)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/pylon/services/knowledge.py", line 67, in save_document
parsed_document = parse_document(
^^^^^^^^^^^^^^^
File "/app/pylon/core/document/parser.py", line 126, in parse_document
raise e
File "/app/pylon/core/document/parser.py", line 119, in parse_document
documents: list[LCDocument] = Parallel(n_jobs=-1, prefer='processes')(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/joblib/parallel.py", line 1918, in __call__
return output if self.return_generator else list(output)
^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/joblib/parallel.py", line 1847, in _get_sequential_output
res = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/app/pylon/core/document/unstructured.py", line 38, in parse_document_with_unstructuredio
raise FileParserAPIError(f'Failed to parse document from unstructured-io: {filename}. Error: {e!s}') from e
pylon.exceptions.custom_exceptions.FileParserAPIError: Failed to connect or communicate with the file parser server. details: Failed to parse document from unstructured-io. Error: Can't patch loop of type <class 'uvloop.Loop'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 406, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
await self.app(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 756, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 75, in app
await response(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/responses.py", line 162, in __call__
await self.background()
File "/usr/local/lib/python3.11/site-packages/starlette/background.py", line 45, in __call__
await task()
File "/usr/local/lib/python3.11/site-packages/starlette/background.py", line 30, in __call__
await run_in_threadpool(self.func, *self.args, **self.kwargs)
File "/usr/local/lib/python3.11/site-packages/starlette/concurrency.py", line 42, in run_in_threadpool
return await anyio.to_thread.run_sync(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 859, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/pylon/routers/document.py", line 46, in create_documents_process
handle_exception(
File "/app/pylon/exceptions/handlers.py", line 46, in handle_exception
send_callback(callback_url, error_response)
File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 336, in wrapped_f
return copy(f, *args, **kw)
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 475, in __call__
do = self.iter(retry_state=retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 376, in iter
result = action(retry_state)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 418, in exc_check
raise retry_exc.reraise()
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 185, in reraise
raise self.last_attempt.result()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 449, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 478, in __call__
result = fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/app/pylon/utils/callbacks.py", line 11, in send_callback
response.raise_for_status()
File "/usr/local/lib/python3.11/site-packages/requests/models.py", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
Any guidance on resolving this issue or workarounds would be greatly appreciated.