-
Notifications
You must be signed in to change notification settings - Fork 67
Open
Description
Dear all,
this paper: https://arxiv.org/abs/2402.13610 seems to trigger a bug in this parse_pdf.py:L281:
File "[redacted]/src/parse_pdf.py", line 137, in <module>
paper = parse_pdf_only_text(pdf_path)
File "[redacted]/src/parse_pdf.py", line 88, in parse_pdf_only_text
paper_dict = scipdf.parse_pdf_to_dict(pdf_path)
File "[redacted]/.local/lib/python3.10/site-packages/scipdf/pdf/parse_pdf.py", line 412, in parse_pdf_to_dict
article_dict = convert_article_soup_to_dict(parsed_article, as_list=as_list)
File "[redacted]/.local/lib/python3.10/site-packages/scipdf/pdf/parse_pdf.py", line 368, in convert_article_soup_to_dict
article_dict["figures"] = parse_figure_caption(article)
File "[redacted]/.local/lib/python3.10/site-packages/scipdf/pdf/parse_pdf.py", line 281, in parse_figure_caption
label = figure.find("label").text
AttributeError: 'NoneType' object has no attribute 'text'My simple workaround is to just surround the assignment with a try/catch block like this:
try:
label = figure.find("label").text
except:
label = ''I assume this might mask some underlying issues, so I did not submit a pull request; if I have time I might dig deeper in a few days.
Please let me know if I can be of any help or provide further debug output.
Cheers,
Severin
Metadata
Metadata
Assignees
Labels
No labels