Skip to content

Commit 224f9ad

Browse files
authored
fix/allow asyncio page splitting in nested event loops (#104)
We're now using asyncio for page split concurrency, but because the client itself is not async, we need to manage our own event loop. This complains if your environment already has a running event loop. For instance, setting `split_pdf_page=True` in a jupyter cell will give you `RuntimeError: This event loop is already running`. Turns out there's a simple library to allow for nested event loops. We just apply the monkeypatch in split_pdf_hook.py and the error goes away. To verify, you'll need to run `pip install -e .` to install the local version of the client. Run `make run-jupyter` and open up the sample notebook in `_jupyter/`. Try making a request with page splitting enabled and you'll see the above error. Then, check out this branch, install locally again, restart your jupyter kernel, and the error is fixed.
1 parent d024671 commit 224f9ad

File tree

2 files changed

+4
-0
lines changed

2 files changed

+4
-0
lines changed

Diff for: setup.py

+1
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@
3737
"jsonpath-python>=1.0.6",
3838
"marshmallow>=3.19.0",
3939
"mypy-extensions>=1.0.0",
40+
"nest-asyncio>=1.6.0",
4041
"packaging>=23.1",
4142
"pypdf>=4.0",
4243
"python-dateutil>=2.8.2",

Diff for: src/unstructured_client/_hooks/custom/split_pdf_hook.py

+3
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
from typing import Any, Coroutine, Optional, Tuple, Union
99

1010
import httpx
11+
import nest_asyncio
1112
import requests
1213
from pypdf import PdfReader
1314
from requests_toolbelt.multipart.decoder import MultipartDecoder
@@ -40,6 +41,8 @@
4041
MIN_PAGES_PER_SPLIT = 2
4142
MAX_PAGES_PER_SPLIT = 20
4243

44+
nest_asyncio.apply()
45+
4346

4447
async def run_tasks(tasks):
4548
return await asyncio.gather(*tasks)

0 commit comments

Comments
 (0)