You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
chore: Update README and cut 0.26.0 for publishing (#188)
Bring back some of the autogenerated README content and make sure our
manual sections are using the right syntax. Once we merge and
regenerate, 0.26.0 will be published.
NOTE: This README is for the `0.26.0-beta` version. The current published SDK, `0.25.5` can be found [here](https://github.com/Unstructured-IO/unstructured-python-client/blob/v0.25.5/README.md).
27
-
28
20
This is a Python client for the [Unstructured API](https://docs.unstructured.io/api-reference/api-services/saas-api-development-guide) and you can sign up for your API key on https://app.unstructured.io.
29
21
30
22
Please refer to the [Unstructured docs](https://docs.unstructured.io/api-reference/api-services/sdk-python) for a full guide to using the client.
@@ -73,94 +65,6 @@ poetry add unstructured-client
73
65
```
74
66
<!-- End SDK Installation [installation] -->
75
67
76
-
## SDK Example Usage
77
-
78
-
### Example
79
-
80
-
```python
81
-
import os
82
-
83
-
import unstructured_client
84
-
from unstructured_client.models import operations, shared
85
-
86
-
client = unstructured_client.UnstructuredClient(
87
-
api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"),
88
-
server_url=os.getenv("UNSTRUCTURED_API_URL"),
89
-
)
90
-
91
-
filename ="PATH_TO_FILE"
92
-
withopen(filename, "rb") as f:
93
-
data = f.read()
94
-
95
-
req = operations.PartitionRequest(
96
-
partition_parameters=shared.PartitionParameters(
97
-
files=shared.Files(
98
-
content=data,
99
-
file_name=filename,
100
-
),
101
-
# --- Other partition parameters ---
102
-
strategy=shared.Strategy.AUTO,
103
-
languages=['eng'],
104
-
),
105
-
)
106
-
107
-
try:
108
-
res = client.general.partition(request=req)
109
-
print(res.elements[0])
110
-
exceptExceptionas e:
111
-
print(e)
112
-
```
113
-
Refer to the [API parameters page](https://docs.unstructured.io/api-reference/api-services/api-parameters) for all available parameters.
114
-
115
-
### Configuration
116
-
117
-
#### Splitting PDF by pages
118
-
119
-
See [page splitting](https://docs.unstructured.io/api-reference/api-services/sdk#page-splitting) for more details.
120
-
121
-
In order to speed up processing of large PDF files, the client splits up PDFs into smaller files, sends these to the API concurrently, and recombines the results. `split_pdf_page` can be set to `False` to disable this.
122
-
123
-
The amount of workers utilized for splitting PDFs is dictated by the `split_pdf_concurrency_level` parameter, with a default of 5 and a maximum of 15 to keep resource usage and costs in check. The splitting process leverages `asyncio` to manage concurrency effectively.
124
-
The size of each batch of pages (ranging from 2 to 20) is internally determined based on the concurrency level and the total number of pages in the document. Because the splitting process uses `asyncio` the client can encouter event loop issues if it is nested in another async runner, like running in a `gevent` spawned task. Instead, this is safe to run in multiprocessing workers (e.g., using `multiprocessing.Pool` with `fork` context).
125
-
126
-
Example:
127
-
```python
128
-
req = shared.PartitionParameters(
129
-
files=files,
130
-
strategy="fast",
131
-
languages=["eng"],
132
-
split_pdf_concurrency_level=8
133
-
)
134
-
```
135
-
136
-
#### Sending specific page ranges
137
-
138
-
When `split_pdf_page=True` (the default), you can optionally specify a page range to send only a portion of your PDF to be extracted. The parameter takes a list of two integers to specify the range, inclusive. A ValueError is thrown if the page range is invalid.
139
-
140
-
Example:
141
-
```python
142
-
req = shared.PartitionParameters(
143
-
files=files,
144
-
strategy="fast",
145
-
languages=["eng"],
146
-
split_pdf_page_range=[10,15],
147
-
)
148
-
```
149
-
150
-
#### Splitting PDF by pages - strict mode
151
-
152
-
When `split_pdf_allow_failed=False` (the default), any errors encountered during sending parallel request will break the process and raise an exception.
153
-
When `split_pdf_allow_failed=True`, the process will continue even if some requests fail, and the results will be combined at the end (the output from the errored pages will not be included).
154
-
155
-
Example:
156
-
```python
157
-
req = shared.PartitionParameters(
158
-
files=files,
159
-
strategy="fast",
160
-
languages=["eng"],
161
-
split_pdf_allow_failed=True,
162
-
)
163
-
```
164
68
165
69
<!-- Start Retries [retries] -->
166
70
## Retries
@@ -229,6 +133,59 @@ if res.elements is not None:
229
133
```
230
134
<!-- End Retries [retries] -->
231
135
136
+
137
+
<!-- Start Error Handling [errors] -->
138
+
## Error Handling
139
+
140
+
Handling errors in this SDK should largely match your expectations. All operations return a response object or raise an error. If Error objects are specified in your OpenAPI Spec, the SDK will raise the appropriate Error type.
Refer to the [API parameters page](https://docs.unstructured.io/api-reference/api-services/api-parameters) for all available parameters.
347
+
348
+
349
+
## Configuration
350
+
351
+
### Splitting PDF by pages
352
+
353
+
See [page splitting](https://docs.unstructured.io/api-reference/api-services/sdk#page-splitting) for more details.
354
+
355
+
In order to speed up processing of large PDF files, the client splits up PDFs into smaller files, sends these to the API concurrently, and recombines the results. `split_pdf_page` can be set to `False` to disable this.
356
+
357
+
The amount of workers utilized for splitting PDFs is dictated by the `split_pdf_concurrency_level` parameter, with a default of 5 and a maximum of 15 to keep resource usage and costs in check. The splitting process leverages `asyncio` to manage concurrency effectively.
358
+
The size of each batch of pages (ranging from 2 to 20) is internally determined based on the concurrency level and the total number of pages in the document. Because the splitting process uses `asyncio` the client can encouter event loop issues if it is nested in another async runner, like running in a `gevent` spawned task. Instead, this is safe to run in multiprocessing workers (e.g., using `multiprocessing.Pool` with `fork` context).
359
+
360
+
Example:
361
+
```python
362
+
req = operations.PartitionRequest(
363
+
partition_parameters=shared.PartitionParameters(
364
+
files=files,
365
+
strategy="fast",
366
+
languages=["eng"],
367
+
split_pdf_concurrency_level=8
368
+
)
369
+
)
370
+
```
371
+
372
+
### Sending specific page ranges
373
+
374
+
When `split_pdf_page=True` (the default), you can optionally specify a page range to send only a portion of your PDF to be extracted. The parameter takes a list of two integers to specify the range, inclusive. A ValueError is thrown if the page range is invalid.
375
+
376
+
Example:
377
+
```python
378
+
req = operations.PartitionRequest(
379
+
partition_parameters=shared.PartitionParameters(
380
+
files=files,
381
+
strategy="fast",
382
+
languages=["eng"],
383
+
split_pdf_page_range=[10,15],
384
+
)
385
+
)
386
+
```
387
+
388
+
### Splitting PDF by pages - strict mode
389
+
390
+
When `split_pdf_allow_failed=False` (the default), any errors encountered during sending parallel request will break the process and raise an exception.
391
+
When `split_pdf_allow_failed=True`, the process will continue even if some requests fail, and the results will be combined at the end (the output from the errored pages will not be included).
392
+
393
+
Example:
394
+
```python
395
+
req = operations.PartitionRequest(
396
+
partition_parameters=shared.PartitionParameters(
397
+
files=files,
398
+
strategy="fast",
399
+
languages=["eng"],
400
+
split_pdf_allow_failed=True,
401
+
)
402
+
)
403
+
```
404
+
330
405
<!-- Start File uploads [file-upload] -->
331
406
## File uploads
332
407
@@ -380,6 +455,11 @@ s = UnstructuredClient(debug_logger=logging.getLogger("unstructured_client"))
380
455
```
381
456
<!-- End Debugging [debug] -->
382
457
458
+
<!-- No SDK Available Operations -->
459
+
<!-- No Pagination -->
460
+
<!-- No Server Selection -->
461
+
<!-- No Authentication -->
462
+
383
463
<!-- Placeholder for Future Speakeasy SDK Sections -->
0 commit comments