Description
Is your feature request related to a problem? Please describe.
Using the OCR strategy when partitioning PDFs, processing of some PDF files will allocate a large amount of memory that isn't available in all environments (e.g. when running via Google cloud run with limited resources).
For example, the following 23MB PDF causes memory usage of >10GB when partitioning: https://drive.google.com/file/d/1lr-Pwh3QTVfdY4F6R-fk4tVU9FNSK27p/view?usp=sharing
Describe the solution you'd like
Unstructured should employ sensitive defaults to avoid this kind of situations (e.g. a max size of a page when rendered in memory). This could also be configurable as optional argument on the partitioning method.
In cases where this isn't feasible, the partitioning method should raise a descriptive exception so the caller can handle the situation gracefully instead of crashing the process.
The most important aspect is giving a way to limit the amount of memory unstructured will use during partitioning.
Describe alternatives you've considered
Alternatively, the partitioning can be run in a separate memory-limited process which is controlled by another process. In case the partitioning process runs out of memory, the orchestration process can handle the situation.