Open
Description
The function run_and_get_multiple_output
builds a string with parameters corresponding to the given extensions:
EXTENTION_TO_CONFIG = {
'box': 'tessedit_create_boxfile=1 batch.nochop makebox',
'xml': 'tessedit_create_alto=1',
'hocr': 'tessedit_create_hocr=1',
'tsv': 'tessedit_create_tsv=1',
}
...
config = ' '.join(
EXTENTION_TO_CONFIG.get(extension, '') for extension in extensions
).strip()
if config:
config = f'-c {config}'
else:
config = ''
For type box
, the names of config files are included. Given that -c
appears only before the composed string, the parameters following the config filenames are ignored, and specifically for types xml
and tsv
there is no output.
This case is missed by the test, which only considers box
after tsv
and doesn't trigger the FileNotFoundError
.
It also doesn't test for the xml
extension, but would fail the assertion comparing the results: run_and_get_multiple_output
reads from the outputted xml
as a string, while the expected output from the corresponding function image_to_alto_xml()
is in bytes.
Metadata
Metadata
Assignees
Labels
No labels