Skip to content

Alternative to supplying the original image? #25

@joewiz

Description

@joewiz

Hello! Thanks for this great tool! I see that, since the last time I used the code, @bertsky's PR that I'd been depending on #23 has been merged in, so I can use master - which is great!

I am writing with a request for an enhancement.

In my project I am working with images hosted in an S3 bucket and fronted by a IIIF-compliant image server (namely, Cantaloupe). I do not have the images on the system where I'm running textract2page. I would like to avoid having to download all of the images to my system, just to run textract2page.

The README explains why the image must be passed into the utility:

because Textract stores coordinates in float ratios, whereas PAGE uses int in pixel indices

Would there be some way I could pass in the pixel dimensions of the image? I can retrieve these easily via the IIIF API.

In my reading of the source code, the underlying convert_page function retrieves the image's dimensions here.

How about if the utility is passed a --width= and --height= flag instead of IMAGE_FILE, the utility could use these supplied values instead of requiring the image? Or some variation of this idea?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions