Skip to content

Calling with multiple output types fails on certain combinations #571

Open
@AdventurousGui

Description

@AdventurousGui

The function run_and_get_multiple_output builds a string with parameters corresponding to the given extensions:

EXTENTION_TO_CONFIG = {
    'box': 'tessedit_create_boxfile=1 batch.nochop makebox',
    'xml': 'tessedit_create_alto=1',
    'hocr': 'tessedit_create_hocr=1',
    'tsv': 'tessedit_create_tsv=1',
}

...
    config = ' '.join(
        EXTENTION_TO_CONFIG.get(extension, '') for extension in extensions
    ).strip()
    if config:
        config = f'-c {config}'
    else:
        config = ''

For type box, the names of config files are included. Given that -c appears only before the composed string, the parameters following the config filenames are ignored, and specifically for types xml and tsv there is no output.

This case is missed by the test, which only considers box after tsv and doesn't trigger the FileNotFoundError.

It also doesn't test for the xml extension, but would fail the assertion comparing the results: run_and_get_multiple_output reads from the outputted xml as a string, while the expected output from the corresponding function image_to_alto_xml() is in bytes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions