-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Description
It looks like, that the METS parser does not allow structures like this in METS:
<div ID="DIVL5" TYPE="TITLE_OF_WORK">
<fptr>
<area BETYPE="IDREF" FILEID="ALTO00001" BEGIN="P1_TB00002"/>
</fptr>
If I call mm2tei with this kind of METS I get an exception:
Traceback (most recent call last):
File "/home/calamariadmin/tei_venv_3.7/bin/mm2tei", line 8, in <module>
sys.exit(cli())
File "/home/calamariadmin/tei_venv_3.7/lib/python3.7/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/home/calamariadmin/tei_venv_3.7/lib/python3.7/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/calamariadmin/tei_venv_3.7/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/calamariadmin/tei_venv_3.7/lib/python3.7/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/calamariadmin/tei_venv_3.7/lib/python3.7/site-packages/mets_mods2tei/scripts/mets_mods2tei.py", line 56, in cli
tei.fill_from_mets(mets, ocr)
File "/home/calamariadmin/tei_venv_3.7/lib/python3.7/site-packages/mets_mods2tei/api/tei.py", line 175, in fill_from_mets
self.add_div_structure(div)
File "/home/calamariadmin/tei_venv_3.7/lib/python3.7/site-packages/mets_mods2tei/api/tei.py", line 831, in add_div_structure
div = div.get_div()[0]
IndexError: list index out of range
As a starting point an "ignore" of <fptr><area> in <div> area would be good.
In general it would be even better, if the OCR text from ALTO is taken from the link referenced there.
Metadata
Metadata
Assignees
Labels
No labels