Skip to content

chore: Fix spelling #173

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ See [page splitting](https://docs.unstructured.io/api-reference/api-services/sdk
In order to speed up processing of large PDF files, the client splits up PDFs into smaller files, sends these to the API concurrently, and recombines the results. `split_pdf_page` can be set to `False` to disable this.

The amount of workers utilized for splitting PDFs is dictated by the `split_pdf_concurrency_level` parameter, with a default of 5 and a maximum of 15 to keep resource usage and costs in check. The splitting process leverages `asyncio` to manage concurrency effectively.
The size of each batch of pages (ranging from 2 to 20) is internally determined based on the concurrency level and the total number of pages in the document. Because the splitting process uses `asyncio` the client can encouter event loop issues if it is nested in another async runner, like running in a `gevent` spawned task. Instead, this is safe to run in multiprocessing workers (e.g., using `multiprocessing.Pool` with `fork` context).
The size of each batch of pages (ranging from 2 to 20) is internally determined based on the concurrency level and the total number of pages in the document. Because the splitting process uses `asyncio` the client can encounter event loop issues if it is nested in another async runner, like running in a `gevent` spawned task. Instead, this is safe to run in multiprocessing workers (e.g., using `multiprocessing.Pool` with `fork` context).

Example:
```python
Expand Down Expand Up @@ -369,9 +369,9 @@ There are two important files used by `make client-generate`:
1. `openapi.json` which is actually not stored here, [but fetched from unstructured-api](https://api.unstructured.io/general/openapi.json), represents the API that is supported on backend.
2. `overlay_client.yaml` is a handcrafted diff that when applied over above, produces `openapi_client.json` which is used to generate SDK.

Once PR with changes is merged, Github CI will autogenerate the Speakeasy client in a new PR, using
Once PR with changes is merged, GitHub CI will autogenerate the Speakeasy client in a new PR, using
the `openapi.json` and `overlay_client.yaml` You will have to manually bring back the human created lines in it.

Feel free to open a PR or a Github issue as a proof of concept and we'll do our best to include it in a future release!
Feel free to open a PR or a GitHub issue as a proof of concept and we'll do our best to include it in a future release!

### SDK Created by [Speakeasy](https://www.speakeasyapi.dev/docs/sdk-design/python/methodology-python)
2 changes: 1 addition & 1 deletion USAGE.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ if res.elements is not None:

</br>

The same SDK client can also be used to make asychronous requests by importing asyncio.
The same SDK client can also be used to make asynchronous requests by importing asyncio.
```python
# Asynchronous Example
import asyncio
Expand Down
2 changes: 1 addition & 1 deletion overlay_client.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ actions:
"split_pdf_allow_failed":
{
"title": "Split Pdf Allow Failed",
"description": "When `split_pdf_page` is set to `True`, this parameter defines the behavior when some of the parallel requests fail. By default `split_pdf_allow_failed` is set to `False` and any failed request send to the API will make the whole process break and raise an Exception. If `split_pdf_allow_failed` is set to `True`, the errors encountered while sending parallel requests will not break the processing - the resuling list of Elements will miss the data from errored pages.",
"description": "When `split_pdf_page` is set to `True`, this parameter defines the behavior when some of the parallel requests fail. By default `split_pdf_allow_failed` is set to `False` and any failed request send to the API will make the whole process break and raise an Exception. If `split_pdf_allow_failed` is set to `True`, the errors encountered while sending parallel requests will not break the processing - the resulting list of Elements will miss the data from errored pages.",
"type": "boolean",
"default": false,
}
Expand Down
2 changes: 1 addition & 1 deletion src/unstructured_client/_hooks/custom/form_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ def get_split_pdf_allow_failed_param(
def get_split_pdf_concurrency_level_param(
form_data: FormData, key: str, fallback_value: int, max_allowed: int
) -> int:
"""Retrieves the value for concurreny level that should be used for splitting pdf.
"""Retrieves the value for concurrency level that should be used for splitting pdf.

In case given the number is not a valid integer or less than 1, it will use the
default value.
Expand Down
4 changes: 2 additions & 2 deletions src/unstructured_client/models/shared/partition_parameters.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ class PartitionParametersTypedDict(TypedDict):
skip_infer_table_types: NotRequired[List[str]]
r"""The document types that you want to skip table extraction with. Default: []"""
split_pdf_allow_failed: NotRequired[bool]
r"""When `split_pdf_page` is set to `True`, this parameter defines the behavior when some of the parallel requests fail. By default `split_pdf_allow_failed` is set to `False` and any failed request send to the API will make the whole process break and raise an Exception. If `split_pdf_allow_failed` is set to `True`, the errors encountered while sending parallel requests will not break the processing - the resuling list of Elements will miss the data from errored pages."""
r"""When `split_pdf_page` is set to `True`, this parameter defines the behavior when some of the parallel requests fail. By default `split_pdf_allow_failed` is set to `False` and any failed request send to the API will make the whole process break and raise an Exception. If `split_pdf_allow_failed` is set to `True`, the errors encountered while sending parallel requests will not break the processing - the resulting list of Elements will miss the data from errored pages."""
split_pdf_concurrency_level: NotRequired[int]
r"""When `split_pdf_page` is set to `True`, this parameter specifies the number of workers used for sending requests when the PDF is split on the client side. It's an internal parameter for the Python client and is not sent to the backend."""
split_pdf_page: NotRequired[bool]
Expand Down Expand Up @@ -152,7 +152,7 @@ class PartitionParameters(BaseModel):
skip_infer_table_types: Annotated[Optional[List[str]], FieldMetadata(multipart=True)] = None
r"""The document types that you want to skip table extraction with. Default: []"""
split_pdf_allow_failed: Annotated[Optional[bool], FieldMetadata(multipart=True)] = False
r"""When `split_pdf_page` is set to `True`, this parameter defines the behavior when some of the parallel requests fail. By default `split_pdf_allow_failed` is set to `False` and any failed request send to the API will make the whole process break and raise an Exception. If `split_pdf_allow_failed` is set to `True`, the errors encountered while sending parallel requests will not break the processing - the resuling list of Elements will miss the data from errored pages."""
r"""When `split_pdf_page` is set to `True`, this parameter defines the behavior when some of the parallel requests fail. By default `split_pdf_allow_failed` is set to `False` and any failed request send to the API will make the whole process break and raise an Exception. If `split_pdf_allow_failed` is set to `True`, the errors encountered while sending parallel requests will not break the processing - the resulting list of Elements will miss the data from errored pages."""
split_pdf_concurrency_level: Annotated[Optional[int], FieldMetadata(multipart=True)] = 5
r"""When `split_pdf_page` is set to `True`, this parameter specifies the number of workers used for sending requests when the PDF is split on the client side. It's an internal parameter for the Python client and is not sent to the backend."""
split_pdf_page: Annotated[Optional[bool], FieldMetadata(multipart=True)] = True
Expand Down
4 changes: 2 additions & 2 deletions src/unstructured_client/utils/queryparams.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,10 +66,10 @@ def _populate_query_params(
f_name = field.alias if field.alias is not None else name
serialization = metadata.serialization
if serialization is not None:
serialized_parms = _get_serialized_params(
serialized_params = _get_serialized_params(
metadata, f_name, value, param_field_types[name]
)
for key, value in serialized_parms.items():
for key, value in serialized_params.items():
if key in query_param_values:
query_param_values[key].extend(value)
else:
Expand Down
Loading