You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: Parameter to send custom page range when splitting pdf (#125)
# New parameter
Add a client side param called `split_pdf_page_range` which takes a list
of two integers, `[start_page, end_page]`. If `split_pdf_page` is `True`
and a range is set, slice the doc from `start_page` up to and including
`end_page`. Only this page range will be sent to the API. The subset of
pages is still split up as needed.
# Other changes
Allow our custom hooks to properly access list parameters, so we're able
to intercept `split_pdf_page_range`. We need extra handling to get list
params out of the request in `parse_form_data`, and to rebuild the
payload in `create_request_body`.
# Testing
Check out this branch and set up a request to your local API:
```
client = UnstructuredClient(api_key_auth="", server_url="localhost:8000")
filename = "_sample_docs/layout-parser-paper.pdf"
with open(filename, "rb") as f:
files = shared.Files(
content=f.read(),
file_name=filename,
)
req = shared.PartitionParameters(
files=files,
strategy="fast",
split_pdf_page=True,
split_pdf_page_range=[1, 16],
)
resp = client.general.partition(req)
```
Test out various page ranges and confirm that the returned elements are
within the range. Invalid ranges should throw a ValueError (pages are
out of bounds, or end_page < start_page).
0 commit comments