-
Notifications
You must be signed in to change notification settings - Fork 538
Open
Labels
Description
The documentation states PDFALTO recognises superscript/subscript.
First, how does GROBID format superscript/subscript? I have not seen <sub> or <sup> in the output.
Second, in my practice, superscript/subscript is printed with space even in formula blocks in the form of H 2 O with no effect. Is it the intended behaviour?
Third, I noticed that superscript/subscript is sometimes misplaced. For example, MnO<sub>2</sub> film is printed as MnO film 2. I can share examples but cannot upload the PDFs as I am not the copyright holder.
I use GROBID 0.7.2 using the command:
curl -sS --form input=@input.pdf --form segmentSentences=1 --form includeRawCitations=1 \
--form includeRawAffiliations=1 --form teiCoordinates=persName --form teiCoordinates=figure \
--form teiCoordinates=ref --form teiCoordinates=biblStruct --form teiCoordinates=formula \
127.0.0.1:8070/api/processFulltextDocument > output.xml
Reactions are currently unavailable