Skip to content

Problem with superscript/subscript #1005

@keto33

Description

@keto33

The documentation states PDFALTO recognises superscript/subscript.

First, how does GROBID format superscript/subscript? I have not seen <sub> or <sup> in the output.

Second, in my practice, superscript/subscript is printed with space even in formula blocks in the form of H 2 O with no effect. Is it the intended behaviour?

Third, I noticed that superscript/subscript is sometimes misplaced. For example, MnO<sub>2</sub> film is printed as MnO film 2. I can share examples but cannot upload the PDFs as I am not the copyright holder.

I use GROBID 0.7.2 using the command:

curl -sS --form input=@input.pdf --form segmentSentences=1 --form includeRawCitations=1 \
--form includeRawAffiliations=1 --form teiCoordinates=persName --form teiCoordinates=figure \
--form teiCoordinates=ref --form teiCoordinates=biblStruct --form teiCoordinates=formula \
127.0.0.1:8070/api/processFulltextDocument > output.xml

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions