-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Hello! Thanks for this great tool! I see that, since the last time I used the code, @bertsky's PR that I'd been depending on #23 has been merged in, so I can use master - which is great!
I am writing with a request for an enhancement.
In my project I am working with images hosted in an S3 bucket and fronted by a IIIF-compliant image server (namely, Cantaloupe). I do not have the images on the system where I'm running textract2page. I would like to avoid having to download all of the images to my system, just to run textract2page.
The README explains why the image must be passed into the utility:
because Textract stores coordinates in
floatratios, whereas PAGE usesintin pixel indices
Would there be some way I could pass in the pixel dimensions of the image? I can retrieve these easily via the IIIF API.
In my reading of the source code, the underlying convert_page function retrieves the image's dimensions here.
How about if the utility is passed a --width= and --height= flag instead of IMAGE_FILE, the utility could use these supplied values instead of requiring the image? Or some variation of this idea?