Skip to content

/api/processCitation extract reference string missing author's middle name. #1351

@right-right-right

Description

@right-right-right

Grobid version

version: 0.8.2, Docker, crf and full model.

Operating System and architecture (arm64, amd64, x86, etc.)

arm64

What is your Java version

11.0.26

Log and information

no error.

Further information

I test crf and full model, They all faced the same situation. According to the grobid document, I do the following tests:

import requests

url = "http://localhost:8070/api/processCitation"
data = {"citations": "Dill, K. A., Ozkan, S. B., Shell, M. S. & Weikl, T. R. The protein folding problem. Annu. Rev. Biophys. 37, 289–316 (2008)."}
res = requests.request("post", url, data=data, headers={"Accept": "application/x-bibtex"}, timeout=None)
print(res.text)

get the result:

@article{-1,
  author = {Dill, K and Ozkan, S and Shell, M and Weikl, T},
  title = {The protein folding problem},
  journal = {Annu. Rev. Biophys},
  date = {2008},
  year = {2008},
  pages = {289--316},
  volume = {37}
}

Author name is Dill, K. A., parse result is Dill, K, maybe it missing the author's middle name.

But I use api/processCitationNames can extract right:

import xmltodict
import requests

url = "http://localhost:8070/api/processCitationNames"
data = {"names": "Dill, K. A., Ozkan, S. B., Shell, M. S. & Weikl, T. R. The protein folding problem. Annu. Rev. Biophys. 37, 289–316 (2008)."}
res = requests.request("post", url, data=data, timeout=None)
pprint( xmltodict.parse('<root xmlns="http://www.tei-c.org/ns/1.0">' + res.text + '</root>'))

result is right:

{'root': {'@xmlns': 'http://www.tei-c.org/ns/1.0',
          'persName': [{'@xmlns': 'http://www.tei-c.org/ns/1.0',
                        'forename': [{'#text': 'K', '@type': 'first'},
                                     {'#text': 'A', '@type': 'middle'}],
                        'surname': 'Dill'},
                       {'@xmlns': 'http://www.tei-c.org/ns/1.0',
                        'forename': [{'#text': 'S', '@type': 'first'},
                                     {'#text': 'B', '@type': 'middle'}],
                        'surname': 'Ozkan'},
                       {'@xmlns': 'http://www.tei-c.org/ns/1.0',
                        'forename': [{'#text': 'M', '@type': 'first'},
                                     {'#text': 'S', '@type': 'middle'}],
                        'surname': 'Shell'},
                       {'@xmlns': 'http://www.tei-c.org/ns/1.0',
                        'forename': [{'#text': 'T', '@type': 'first'},
                                     {'#text': 'R', '@type': 'middle'}],
                        'surname': 'Weikl'},
                       {'@xmlns': 'http://www.tei-c.org/ns/1.0',
                        'surname': 'The Protein Folding Problem'},
                       {'@xmlns': 'http://www.tei-c.org/ns/1.0',
                        'surname': 'Annu'},
                       {'@xmlns': 'http://www.tei-c.org/ns/1.0',
                        'surname': 'Rev'},
                       {'@xmlns': 'http://www.tei-c.org/ns/1.0',
                        'forename': {'#text': ')', '@type': 'first'},
                        'surname': 'Biophys'}]}}

Metadata

Metadata

Assignees

Labels

bugFrom Hemiptera and especially its suborder HeteropteraimplementedThe issue has been implemented

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions