Open
Description
Is your feature request related to a problem? Please describe.
Sometimes when partitioning a pdf with plots and tables, the plot title is being cropped off by the bounding box, which leads to you losing important context for summarization LLM.
Describe the solution you'd like
Add a bbox scale parameter to the partition function to increase/decrease the bounding box size.
Describe alternatives you've considered
Don't know of any alternatives, other than maybe changing the detection model.
Additional context
The change should be done in file: unstructured\partition\pdf_image\pdf_image_utils.py
at line 183 (ver. 0.14.6).
Example of the code that should fix the issue:
offset = 0.18 # Should be a parameter
padded_bbox = cast(
Tuple[int, int, int, int], pad_bbox((x1*(1-offset), y1*(1-offset), x2*(1+offset), y2*(1+offset)), (h_padding, v_padding))
)