Open
Description
Describe the bug
When I set split_pdf_page=True,split_pdf_concurrency_level=15.
Assuming the pdf is divided into 10 sets, it will report an error:
ERROR: Failed to send request for page 1
...
WARNING: Failed to partition set Unstructured-IO/unstructured-api#1, its elements will be omitted in the final result.
...
WARNING: Failed to partition set Unstructured-IO/unstructured-api#9, its elements will be omitted in the final result.
INFO: Successfully partitioned set Unstructured-IO/unstructured-api#10, elements added to the final result.
To Reproduce
code:
import os, json
import requests
from unstructured_client.models.operations import PartitionRequest
from unstructured_client.models.shared import PartitionParameters, ChunkingStrategy
os.environ["UNSTRUCTURED_API_KEY"] = "EMPTY"
os.environ["UNSTRUCTURED_API_URL"] = ""
import unstructured_client
from unstructured_client.models import shared, operations
requests_client = requests.Session()
client = unstructured_client.UnstructuredClient(
api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"),
server_url=os.getenv("UNSTRUCTURED_API_URL"),
client=requests_client
)
filename = "./test_pdf.pdf"
file = open(filename, "rb")
req = operations.PartitionRequest(
partition_parameters=shared.PartitionParameters(
files=shared.Files(
content=file.read(),
file_name=filename,
),
strategy=shared.Strategy.HI_RES,
split_pdf_page=True,
split_pdf_concurrency_level=15,
chunking_strategy=ChunkingStrategy("by_title")
)
)
try:
res = client.general.partition(req)
element_dicts = [element for element in res.elements]
print(element_dicts)
for e in element_dicts:
print(e['text'])
except Exception as e:
print(e)
Console Information:
INFO: Preparing to split document for partition.
INFO: Concurrency level set to 15
INFO: Splitting pages 1 to 23 (23 total)
INFO: Determined optimal split size of 2 pages.
INFO: Partitioning 11 files with 2 page(s) each.
INFO: Partitioning 1 file with 1 page(s).
INFO: Partitioning set Unstructured-IO/unstructured-api#1 (pages 1-2).
INFO: Partitioning set Unstructured-IO/unstructured-api#2 (pages 3-4).
INFO: Partitioning set Unstructured-IO/unstructured-api#3 (pages 5-6).
INFO: Partitioning set Unstructured-IO/unstructured-api#4 (pages 7-8).
INFO: Partitioning set Unstructured-IO/unstructured-api#5 (pages 9-10).
INFO: Partitioning set Unstructured-IO/unstructured-api#6 (pages 11-12).
INFO: Partitioning set Unstructured-IO/unstructured-api#7 (pages 13-14).
INFO: Partitioning set Unstructured-IO/unstructured-api#8 (pages 15-16).
INFO: Partitioning set Unstructured-IO/unstructured-api#9 (pages 17-18).
INFO: Partitioning set Unstructured-IO/unstructured-api#10 (pages 19-20).
INFO: Partitioning set Unstructured-IO/unstructured-api#11 (pages 21-22).
INFO: Partitioning set Unstructured-IO/unstructured-api#12 (pages 23-23).
ERROR: Failed to send request for page 1
ERROR: Failed to send request for page 3
ERROR: Failed to send request for page 5
ERROR: Failed to send request for page 7
ERROR: Failed to send request for page 9
ERROR: Failed to send request for page 11
ERROR: Failed to send request for page 13
ERROR: Failed to send request for page 15
ERROR: Failed to send request for page 17
ERROR: Failed to send request for page 19
ERROR: Failed to send request for page 21
WARNING: Failed to partition set Unstructured-IO/unstructured-api#1, its elements will be omitted in the final result.
WARNING: Failed to partition set Unstructured-IO/unstructured-api#2, its elements will be omitted in the final result.
WARNING: Failed to partition set Unstructured-IO/unstructured-api#3, its elements will be omitted in the final result.
WARNING: Failed to partition set Unstructured-IO/unstructured-api#4, its elements will be omitted in the final result.
WARNING: Failed to partition set Unstructured-IO/unstructured-api#5, its elements will be omitted in the final result.
WARNING: Failed to partition set Unstructured-IO/unstructured-api#6, its elements will be omitted in the final result.
WARNING: Failed to partition set Unstructured-IO/unstructured-api#7, its elements will be omitted in the final result.
WARNING: Failed to partition set Unstructured-IO/unstructured-api#8, its elements will be omitted in the final result.
WARNING: Failed to partition set Unstructured-IO/unstructured-api#9, its elements will be omitted in the final result.
WARNING: Failed to partition set Unstructured-IO/unstructured-api#10, its elements will be omitted in the final result.
WARNING: Failed to partition set Unstructured-IO/unstructured-api#11, its elements will be omitted in the final result.
INFO: Successfully partitioned set Unstructured-IO/unstructured-api#12, elements added to the final result.
INFO: Successfully partitioned the document.
Metadata
Metadata
Assignees
Labels
No labels