Conversation
|
Hi @lfoppiano ! This branch will require quite a few tests I think (I suspect it will raise problems to some of the grobid modules and I need to check the consistency with Pub2TEI), so I pushed its release to version 0.8.0. One thing related to "document structure" versus "narrative style" is the bold style for section titles. I think it's like the italic/bold for the reference markers, the logical "section title" structure is already captured by the For example in the attached pdf, the style should be ignored here: <div xmlns="http://www.tei-c.org/ns/1.0">
<head n="1"><hi rend="bold">Introduction</hi></head>In contrast, the style here should be kept because it corresponds to an highlight within the flow of the paragraph text: <p>12. <hi rend="bold">Average tf-idf similarity between citance and title of the cited paper (F12):</hi> We calculate the similarity of each citance with the title of the cited paper and take an average of it.</p>
<p>13. <hi rend="bold">Maximum tf-idf similarity between citance and title of the cited paper (F13):</hi> We take the maximum of similarity of the citances with the title of the cited paper.</p>Does it make sense? |
|
@kermitt2 yes, no problem to push it further. OK to the change you propose. |
|
The crazy part was to merge the master back in this branch 😅
I've made the change and now the text within the
I'm not sure what you mean in this case 🙂 |
# Conflicts: # grobid-core/src/main/java/org/grobid/core/document/TEIFormatter.java # grobid-core/src/test/java/org/grobid/core/document/TEIFormatterTest.java
This PR is implementing the styles italic, bold superscript and subscript in the output xml.
See information at #160