Feat/remove reference of PageLayout.elements #3943
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR removes usage of
PageLayout.elements
from partition function, except for whenanalysis=True
. This PR updates the partition logic so thatPageLayout.elements_array
is used everywhere to save memory and cpu cost.Since the analysis function is intended for investigation and not for general document processing purposes, this part of the code is left for a future refactor.
PageLayout.elements
uses a list to store layout elements' data whileelements_array
usesnumpy
array to store the data, which has much lower memory requirements. Usingmemory_profiler
to test the differences is usually around 10x.