Skip to content

Commit 3d13c23

Browse files
authored
fix: raise httpx timeout value for split pdf requests (#168)
We're still seeing some ReadTimeout errors when a pdf is split and sent for `hi_res` processing. For now, let's raise the default timeout to 30 minutes, and allow for tweaking via the UNSTRUCTURED_CLIENT_TIMEOUT_MINUTES. Users should not generally need to adjust this, but it may help us debug their environment. When our split pdf hook is able to reuse the SDK logic, the client timeout will be exposed as a parameter, and we can remove this variable. Other changes: * Update CI workflow to run against release branches. This 0.25.x branch should be running tests as long as it sticks around. * Bump the gen.yaml version to 0.25.7. The next generate/publish job on this branch will use this.
1 parent adbee1d commit 3d13c23

File tree

3 files changed

+12
-5
lines changed

3 files changed

+12
-5
lines changed

Diff for: .github/workflows/ci.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@ name: CI
22

33
on:
44
push:
5-
branches: [ main ]
5+
branches: [ main, release/* ]
66
pull_request:
7-
branches: [ main ]
7+
branches: [ main, release/* ]
88
merge_group:
99
branches: [ main ]
1010

Diff for: gen.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ generation:
1010
auth:
1111
oAuth2ClientCredentialsEnabled: false
1212
python:
13-
version: 0.25.6
13+
version: 0.25.7
1414
additionalDependencies:
1515
dependencies:
1616
deepdiff: '>=6.0'

Diff for: src/unstructured_client/_hooks/custom/split_pdf_hook.py

+9-2
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
import io
55
import logging
66
import math
7+
import os
78
from collections.abc import Awaitable
89
from typing import Any, Coroutine, Optional, Tuple, Union
910

@@ -252,9 +253,15 @@ def before_request(
252253
page_count % split_size,
253254
)
254255

256+
# Use a variable to adjust the httpx client timeout, or default to 30 minutes
257+
# When we're able to reuse the SDK to make these calls, we can remove this var
258+
# The SDK timeout will be controlled by parameter
259+
client_timeout_minutes = 30
260+
if timeout_var := os.getenv("UNSTRUCTURED_CLIENT_TIMEOUT_MINUTES"):
261+
client_timeout_minutes = int(timeout_var)
262+
255263
async def call_api_partial(page):
256-
# Individual calls should return within 10 minutes
257-
client_timeout = httpx.Timeout(60 * 10)
264+
client_timeout = httpx.Timeout(60 * client_timeout_minutes)
258265
async with httpx.AsyncClient(timeout=client_timeout) as client:
259266
try:
260267
httpx_response = await request_utils.call_api_async(

0 commit comments

Comments
 (0)