Skip to content

Commit f967c39

Browse files
AsiaCaohumpydonkey
andauthored
chore: add more docs for parallelism (#12)
Add more docs about request parallelism Co-authored-by: Yazhou Cao <cyz19892002@gmail.com>
1 parent 7367c06 commit f967c39

File tree

1 file changed

+10
-4
lines changed

1 file changed

+10
-4
lines changed

README.md

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ This section describes some of the key features this library offers.
7070

7171
### Parse Large PDF Files
7272

73-
A single REST API call can only handle up to 2 pages at a time. This library automatically splits a large PDF into multiple calls, uses a thread pool to process the calls in parallel, and stitches the results back together as a single result.
73+
**A single REST API call can only handle up to 2 pages at a time.** This library automatically splits a large PDF into multiple calls, uses a thread pool to process the calls in parallel, and stitches the results back together as a single result.
7474

7575
We've used this library to successfully parse PDFs that are 1000+ pages long.
7676

@@ -125,11 +125,17 @@ MAX_RETRY_WAIT_TIME=30
125125
RETRY_LOGGING_STYLE=log_msg
126126
```
127127

128-
### Set `MAX_WORKERS`
128+
### Max Parallelism
129129

130-
Increasing `MAX_WORKERS` increases the number of concurrent requests, which can speed up the processing of large files if you have a high enough API rate limit. Otherwise, you hit the rate limit error and the library just keeps retrying for you.
130+
The maximum number of parallel requests is determined by multiplying `BATCH_SIZE` × `MAX_WORKERS`.
131131

132-
The optimal `MAX_WORKERS` value depends on your API rate limit and the latency of each REST API call. For example, if your account has a rate limit of 5 requests per minute, and each REST API call takes about 60 seconds to complete, then `MAX_WORKERS` should be set to 5.
132+
> **NOTE:** The maximum parallelism allowed by this library is 100.
133+
134+
Specifically, increasing `MAX_WORKERS` can speed up the processing of large individual files, while increasing `BATCH_SIZE` improves throughput when processing multiple files.
135+
136+
> **NOTE:** Your job's maximum processing throughput may be limited by your API rate limit. If your rate limit isn't high enough, you may encounter rate limit errors, which the library will automatically handle through retries.
137+
138+
The optimal values for `MAX_WORKERS` and `BATCH_SIZE` depend on your API rate limit and the latency of each REST API call. For example, if your account has a rate limit of 5 requests per minute, and each REST API call takes approximately 60 seconds to complete, and you're processing a single large file, then `MAX_WORKERS` should be set to 5 and `BATCH_SIZE` to 1.
133139

134140
You can find your REST API latency in the logs. If you want to increase your rate limit, schedule a time to meet with us [here](https://scheduler.zoom.us/d/56i81uc2/landingai-document-extraction).
135141

0 commit comments

Comments
 (0)