Skip to content

Commit f6c6247

Browse files
authored
chore: Update README and cut 0.26.0 for publishing (#188)
Bring back some of the autogenerated README content and make sure our manual sections are using the right syntax. Once we merge and regenerate, 0.26.0 will be published.
1 parent 6e1fa29 commit f6c6247

File tree

2 files changed

+184
-104
lines changed

2 files changed

+184
-104
lines changed

Diff for: README.md

+183-103
Original file line numberDiff line numberDiff line change
@@ -11,20 +11,12 @@
1111

1212
<div align="center">
1313

14-
<a
15-
href="https://www.phorm.ai/query?projectId=34efc517-2201-4376-af43-40c4b9da3dc5">
16-
<img src="https://img.shields.io/badge/Phorm-Ask_AI-%23F2777A.svg?&logo=" />
17-
</a>
18-
1914
</div>
2015

21-
2216
<h2 align="center">
2317
<p>Python SDK for the Unstructured API</p>
2418
</h2>
2519

26-
NOTE: This README is for the `0.26.0-beta` version. The current published SDK, `0.25.5` can be found [here](https://github.com/Unstructured-IO/unstructured-python-client/blob/v0.25.5/README.md).
27-
2820
This is a Python client for the [Unstructured API](https://docs.unstructured.io/api-reference/api-services/saas-api-development-guide) and you can sign up for your API key on https://app.unstructured.io.
2921

3022
Please refer to the [Unstructured docs](https://docs.unstructured.io/api-reference/api-services/sdk-python) for a full guide to using the client.
@@ -73,94 +65,6 @@ poetry add unstructured-client
7365
```
7466
<!-- End SDK Installation [installation] -->
7567

76-
## SDK Example Usage
77-
78-
### Example
79-
80-
```python
81-
import os
82-
83-
import unstructured_client
84-
from unstructured_client.models import operations, shared
85-
86-
client = unstructured_client.UnstructuredClient(
87-
api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"),
88-
server_url=os.getenv("UNSTRUCTURED_API_URL"),
89-
)
90-
91-
filename = "PATH_TO_FILE"
92-
with open(filename, "rb") as f:
93-
data = f.read()
94-
95-
req = operations.PartitionRequest(
96-
partition_parameters=shared.PartitionParameters(
97-
files=shared.Files(
98-
content=data,
99-
file_name=filename,
100-
),
101-
# --- Other partition parameters ---
102-
strategy=shared.Strategy.AUTO,
103-
languages=['eng'],
104-
),
105-
)
106-
107-
try:
108-
res = client.general.partition(request=req)
109-
print(res.elements[0])
110-
except Exception as e:
111-
print(e)
112-
```
113-
Refer to the [API parameters page](https://docs.unstructured.io/api-reference/api-services/api-parameters) for all available parameters.
114-
115-
### Configuration
116-
117-
#### Splitting PDF by pages
118-
119-
See [page splitting](https://docs.unstructured.io/api-reference/api-services/sdk#page-splitting) for more details.
120-
121-
In order to speed up processing of large PDF files, the client splits up PDFs into smaller files, sends these to the API concurrently, and recombines the results. `split_pdf_page` can be set to `False` to disable this.
122-
123-
The amount of workers utilized for splitting PDFs is dictated by the `split_pdf_concurrency_level` parameter, with a default of 5 and a maximum of 15 to keep resource usage and costs in check. The splitting process leverages `asyncio` to manage concurrency effectively.
124-
The size of each batch of pages (ranging from 2 to 20) is internally determined based on the concurrency level and the total number of pages in the document. Because the splitting process uses `asyncio` the client can encouter event loop issues if it is nested in another async runner, like running in a `gevent` spawned task. Instead, this is safe to run in multiprocessing workers (e.g., using `multiprocessing.Pool` with `fork` context).
125-
126-
Example:
127-
```python
128-
req = shared.PartitionParameters(
129-
files=files,
130-
strategy="fast",
131-
languages=["eng"],
132-
split_pdf_concurrency_level=8
133-
)
134-
```
135-
136-
#### Sending specific page ranges
137-
138-
When `split_pdf_page=True` (the default), you can optionally specify a page range to send only a portion of your PDF to be extracted. The parameter takes a list of two integers to specify the range, inclusive. A ValueError is thrown if the page range is invalid.
139-
140-
Example:
141-
```python
142-
req = shared.PartitionParameters(
143-
files=files,
144-
strategy="fast",
145-
languages=["eng"],
146-
split_pdf_page_range=[10,15],
147-
)
148-
```
149-
150-
#### Splitting PDF by pages - strict mode
151-
152-
When `split_pdf_allow_failed=False` (the default), any errors encountered during sending parallel request will break the process and raise an exception.
153-
When `split_pdf_allow_failed=True`, the process will continue even if some requests fail, and the results will be combined at the end (the output from the errored pages will not be included).
154-
155-
Example:
156-
```python
157-
req = shared.PartitionParameters(
158-
files=files,
159-
strategy="fast",
160-
languages=["eng"],
161-
split_pdf_allow_failed=True,
162-
)
163-
```
16468

16569
<!-- Start Retries [retries] -->
16670
## Retries
@@ -229,6 +133,59 @@ if res.elements is not None:
229133
```
230134
<!-- End Retries [retries] -->
231135

136+
137+
<!-- Start Error Handling [errors] -->
138+
## Error Handling
139+
140+
Handling errors in this SDK should largely match your expectations. All operations return a response object or raise an error. If Error objects are specified in your OpenAPI Spec, the SDK will raise the appropriate Error type.
141+
142+
| Error Object | Status Code | Content Type |
143+
| -------------------------- | -------------------------- | -------------------------- |
144+
| errors.HTTPValidationError | 422 | application/json |
145+
| errors.ServerError | 5XX | application/json |
146+
| errors.SDKError | 4xx-5xx | */* |
147+
148+
### Example
149+
150+
```python
151+
from unstructured_client import UnstructuredClient
152+
from unstructured_client.models import errors, shared
153+
154+
s = UnstructuredClient()
155+
156+
res = None
157+
try:
158+
res = s.general.partition(request={
159+
"partition_parameters": {
160+
"files": {
161+
"content": open("example.file", "rb"),
162+
"file_name": "example.file",
163+
},
164+
"chunking_strategy": shared.ChunkingStrategy.BY_TITLE,
165+
"split_pdf_page_range": [
166+
1,
167+
10,
168+
],
169+
"strategy": shared.Strategy.HI_RES,
170+
},
171+
})
172+
173+
if res.elements is not None:
174+
# handle response
175+
pass
176+
177+
except errors.HTTPValidationError as e:
178+
# handle e.data: errors.HTTPValidationErrorData
179+
raise(e)
180+
except errors.ServerError as e:
181+
# handle e.data: errors.ServerErrorData
182+
raise(e)
183+
except errors.SDKError as e:
184+
# handle exception
185+
raise(e)
186+
```
187+
<!-- End Error Handling [errors] -->
188+
232189
<!-- Start Custom HTTP Client [http-client] -->
233190
## Custom HTTP Client
234191

@@ -310,13 +267,6 @@ s = UnstructuredClient(async_client=CustomClient(httpx.AsyncClient()))
310267
```
311268
<!-- End Custom HTTP Client [http-client] -->
312269

313-
<!-- No SDK Example Usage [usage] -->
314-
<!-- No SDK Available Operations -->
315-
<!-- No Pagination -->
316-
<!-- No Error Handling -->
317-
<!-- No Server Selection -->
318-
<!-- No Authentication -->
319-
320270
<!-- Start IDE Support [idesupport] -->
321271
## IDE Support
322272

@@ -327,6 +277,131 @@ Generally, the SDK will work well with most IDEs out of the box. However, when u
327277
- [PyCharm Pydantic Plugin](https://docs.pydantic.dev/latest/integrations/pycharm/)
328278
<!-- End IDE Support [idesupport] -->
329279

280+
281+
<!-- Start SDK Example Usage [usage] -->
282+
## SDK Example Usage
283+
284+
### Example
285+
286+
```python
287+
# Synchronous Example
288+
from unstructured_client import UnstructuredClient
289+
from unstructured_client.models import shared
290+
291+
s = UnstructuredClient()
292+
293+
res = s.general.partition(request={
294+
"partition_parameters": {
295+
"files": {
296+
"content": open("example.file", "rb"),
297+
"file_name": "example.file",
298+
},
299+
"chunking_strategy": shared.ChunkingStrategy.BY_TITLE,
300+
"split_pdf_page_range": [
301+
1,
302+
10,
303+
],
304+
"strategy": shared.Strategy.HI_RES,
305+
},
306+
})
307+
308+
if res.elements is not None:
309+
# handle response
310+
pass
311+
```
312+
313+
</br>
314+
315+
The same SDK client can also be used to make asychronous requests by importing asyncio.
316+
```python
317+
# Asynchronous Example
318+
import asyncio
319+
from unstructured_client import UnstructuredClient
320+
from unstructured_client.models import shared
321+
322+
async def main():
323+
s = UnstructuredClient()
324+
res = await s.general.partition_async(request={
325+
"partition_parameters": {
326+
"files": {
327+
"content": open("example.file", "rb"),
328+
"file_name": "example.file",
329+
},
330+
"chunking_strategy": shared.ChunkingStrategy.BY_TITLE,
331+
"split_pdf_page_range": [
332+
1,
333+
10,
334+
],
335+
"strategy": shared.Strategy.HI_RES,
336+
},
337+
})
338+
if res.elements is not None:
339+
# handle response
340+
pass
341+
342+
asyncio.run(main())
343+
```
344+
<!-- End SDK Example Usage [usage] -->
345+
346+
Refer to the [API parameters page](https://docs.unstructured.io/api-reference/api-services/api-parameters) for all available parameters.
347+
348+
349+
## Configuration
350+
351+
### Splitting PDF by pages
352+
353+
See [page splitting](https://docs.unstructured.io/api-reference/api-services/sdk#page-splitting) for more details.
354+
355+
In order to speed up processing of large PDF files, the client splits up PDFs into smaller files, sends these to the API concurrently, and recombines the results. `split_pdf_page` can be set to `False` to disable this.
356+
357+
The amount of workers utilized for splitting PDFs is dictated by the `split_pdf_concurrency_level` parameter, with a default of 5 and a maximum of 15 to keep resource usage and costs in check. The splitting process leverages `asyncio` to manage concurrency effectively.
358+
The size of each batch of pages (ranging from 2 to 20) is internally determined based on the concurrency level and the total number of pages in the document. Because the splitting process uses `asyncio` the client can encouter event loop issues if it is nested in another async runner, like running in a `gevent` spawned task. Instead, this is safe to run in multiprocessing workers (e.g., using `multiprocessing.Pool` with `fork` context).
359+
360+
Example:
361+
```python
362+
req = operations.PartitionRequest(
363+
partition_parameters=shared.PartitionParameters(
364+
files=files,
365+
strategy="fast",
366+
languages=["eng"],
367+
split_pdf_concurrency_level=8
368+
)
369+
)
370+
```
371+
372+
### Sending specific page ranges
373+
374+
When `split_pdf_page=True` (the default), you can optionally specify a page range to send only a portion of your PDF to be extracted. The parameter takes a list of two integers to specify the range, inclusive. A ValueError is thrown if the page range is invalid.
375+
376+
Example:
377+
```python
378+
req = operations.PartitionRequest(
379+
partition_parameters=shared.PartitionParameters(
380+
files=files,
381+
strategy="fast",
382+
languages=["eng"],
383+
split_pdf_page_range=[10,15],
384+
)
385+
)
386+
```
387+
388+
### Splitting PDF by pages - strict mode
389+
390+
When `split_pdf_allow_failed=False` (the default), any errors encountered during sending parallel request will break the process and raise an exception.
391+
When `split_pdf_allow_failed=True`, the process will continue even if some requests fail, and the results will be combined at the end (the output from the errored pages will not be included).
392+
393+
Example:
394+
```python
395+
req = operations.PartitionRequest(
396+
partition_parameters=shared.PartitionParameters(
397+
files=files,
398+
strategy="fast",
399+
languages=["eng"],
400+
split_pdf_allow_failed=True,
401+
)
402+
)
403+
```
404+
330405
<!-- Start File uploads [file-upload] -->
331406
## File uploads
332407

@@ -380,6 +455,11 @@ s = UnstructuredClient(debug_logger=logging.getLogger("unstructured_client"))
380455
```
381456
<!-- End Debugging [debug] -->
382457

458+
<!-- No SDK Available Operations -->
459+
<!-- No Pagination -->
460+
<!-- No Server Selection -->
461+
<!-- No Authentication -->
462+
383463
<!-- Placeholder for Future Speakeasy SDK Sections -->
384464

385465
### Maturity

Diff for: gen.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ generation:
1010
auth:
1111
oAuth2ClientCredentialsEnabled: false
1212
python:
13-
version: 0.26.0-beta.4
13+
version: 0.26.0
1414
additionalDependencies:
1515
dev:
1616
deepdiff: '>=6.0'

0 commit comments

Comments
 (0)