Open
Description
When parsing the bibliographical information, we just insert the keys
invention_title = root_tree.find(invention_title_path)
document_data = {}
if publication_info != None:
publication_reference_info = {element.tag: element.text for element in list(publication_info)}
document_data = {**document_data,**publication_reference_info}
if application_info !=None:
application_reference_info = {element.tag: element.text for element in list(application_info)}
if application_info.attrib and application_info.attrib['appl-type']:
application_reference_info['application_type'] = application_info.attrib['appl-type']
document_data = {**document_data,**application_reference_info}
An example patent might look like this (xml4)
<publication-reference>
<document-id>
<country>US</country>
<doc-number>09784948</doc-number>
<kind>B2</kind>
<date>20171010</date>
</document-id>
</publication-reference>
<application-reference appl-type="utility">
<document-id>
<country>US</country>
<doc-number>15067369</doc-number>
<date>20160311</date>
</document-id>
</application-reference>
The resulting dictionary lacks the patent id now, containing only the application id:
[{'bibliographic_information': {'country': 'US', 'doc-number': '15067369', 'kind': 'B2', 'date': '20160311', 'invention_title': 'xxx'}}]
Metadata
Assignees
Labels
No labels