Open
Description
Generally I would love to have some bounding boxes come back with the text response. Primarily for highlighting locations in the original document where the text got pulled. Not sure exactly how I would proceed with this one, but would love to hear some thoughts.
I think the general flow would be:
- Parse the document with gpt mini
- Split the resulting markdown into semantic sections (i.e. headers, subheaders, tables, etc.)
- For each semantic section, use [insert ai tool] to find bounding boxes in the original image