Extracts Conflict of interests / Author contributions statements by lfoppiano · Pull Request #1319 · grobidOrg/grobid

lfoppiano · 2025-07-30T08:38:28Z

Adds the identification of "conflict of interests / declaration of interests" and author contribution / credit statements.

The evaluation requires an updated version of the dataset which can be found here: https://huggingface.co/datasets/sciencialab/grobid-evaluation where the JATS files have been modified to identify the new statements.
We did our best, given the mess of the JATS Jungle.

We also use a dataset for which the author contribution and the data availability statements were already extracted with a different tool and selected documents that had missing or truncated statements, here some results:

	Iteration 0 (grobid 0.8.1)	Iteration 1 (grobid dev)	Iteration 2 (+50 docs)	Iteration 3 (+113 docs)	Iteration 4 (+97 docs)
Docs missing availability statements	2420	1240	805	317	236
Docs missing contribution statements	3737	1205	767	523	483

… the segmentation

… the header

coveralls · 2025-07-30T10:17:24Z

coverage: 40.323% (-0.07%) from 40.394%
when pulling 80ec370 on feature/coi-ac
into 01fe109 on master.

# Conflicts: # grobid-core/src/main/java/org/grobid/core/engines/FullTextParser.java

…ut TEI

# Conflicts: # grobid-home/models/affiliation-address-BidLSTM_CRF/config.json # grobid-home/models/affiliation-address-BidLSTM_CRF/model_weights.hdf5 # grobid-home/models/affiliation-address-BidLSTM_CRF/preprocessor.json # grobid-home/models/header-BidLSTM_CRF/config.json # grobid-home/models/header-BidLSTM_CRF/model_weights.hdf5 # grobid-home/models/header-BidLSTM_CRF/preprocessor.json # grobid-home/models/header-BidLSTM_ChainCRF/config.json # grobid-home/models/header-BidLSTM_ChainCRF/model_weights.hdf5 # grobid-home/models/header-BidLSTM_ChainCRF/preprocessor.json # grobid-home/models/name-header-BidLSTM_CRF/config.json # grobid-home/models/name-header-BidLSTM_CRF/model_weights.hdf5 # grobid-home/models/name-header-BidLSTM_CRF/preprocessor.json

Signed-off-by: Luca Foppiano <luca@foppiano.org>

lfoppiano · 2026-02-20T06:30:15Z

@kermitt2 it seems that the default "recipe" for training the header-BidLSTM_ChainCRF_FEATURES does not yield the same results as the current header model in the master repository. Do you remember by any chance which parameters did you use? 🙏

FYI I've solved the issue and obtained similar, or slighly better, results.

…LSTM_CRF_FEATURES architecture

…LSTM_ChainCRF_FEATURES architecture second pass

Signed-off-by: Luca Foppiano <luca@foppiano.org>

Copilot

Pull request overview

This PR adds support for extracting conflict of interest statements and author contribution statements from scientific articles. The implementation includes updated model configurations, new tagging labels, and modifications to parsing and processing logic to handle these new statement types.

Changes:

Added new tagging labels for conflict of interest (<conflict>) and author contribution (<contribution>) statements
Updated model configurations to accommodate expanded character vocabularies and new tag types
Modified parsing engines to extract and process the new statement types from both header and body sections

Reviewed changes

Copilot reviewed 27 out of 2468 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
grobid-home/models/*/preprocessor.json	Expanded character vocabularies to support additional Unicode characters encountered in documents
grobid-home/models/*/config.json	Updated character vocabulary sizes and batch sizes to align with preprocessor changes
TaggingLabels.java	Added label constants for conflict of interest and author contribution statements
SegmentationLabels.java	Registered new labels for segmentation model
Segmentation.java	Added handling for conflict and contribution statement extraction in training mode
HeaderParser.java	Implemented extraction logic for conflict and contribution statements from headers
FullTextParser.java	Added processing for conflict and contribution statements from non-header sections
BiblioItem.java	Added fields and accessors for storing conflict and contribution statements
BasicStructureBuilder.java	Removed unused import
doc/training/*.md	Updated documentation to describe new statement types and annotation guidelines
doc/benchmarks/*.md	Updated benchmark results reflecting model performance changes

Comments suppressed due to low confidence (1)

grobid-core/src/main/java/org/grobid/core/engines/HeaderParser.java:1496

Extra blank line added at line 1443. While not functionally problematic, this inconsistent spacing should be removed to maintain code style consistency.

}
}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

grobid-core/src/main/java/org/grobid/core/engines/Segmentation.java

grobid-core/src/main/java/org/grobid/core/engines/FullTextParser.java

lfoppiano added 3 commits July 30, 2025 09:52

identify conflict of interests and author contributions statements in…

4408c35

… the segmentation

identify conflict of interests and author contributions statements in…

9b03d72

… the header

Update parser, fix invalid XML

f439459

Add new statements in the right place

0c10f35

lfoppiano mentioned this pull request Jul 30, 2025

Extracts author contribution and conflict of interest statements #1318

Closed

First training with new statements

193b172

lfoppiano changed the title ~~Add Conflict of interests / Author contributions~~ Extracts Conflict of interests / Author contributions statements Aug 10, 2025

lfoppiano added 22 commits August 10, 2025 10:58

correction existing training data

1fbe4c0

add missing conflict of interest statements in header

6a4c037

correct training one by one

d3cc969

refresh article-lights segmentation data

447e51e

New model after revising all training data

54960b7

add DL models

9a5c6da

add field specification for credits and conflicts of interests

0e5ee26

minor corrections

9421450

update eval fields for COI

bc9acfd

Merge branch 'master' into feature/coi-ac

fe041f0

# Conflicts: # grobid-core/src/main/java/org/grobid/core/engines/FullTextParser.java

update end 2 end evaluation

f84e7a0

output new fields in training data

36f4593

update closing tags

e2bd4a4

enable header and citations

f3d7a34

remove fields that are not giving any meaningful information

9a864a2

correct wrong PLOS adjustment

80ec370

Merge branch 'master' into feature/coi-ac

c1271f7

Merge branch 'master' into feature/coi-ac

e6a2d56

add more training examples

48c01cf

updated segmentation model

1b975e5

revert wrongly disabled consolidation, comment out stuff

eb7768b

update evaluation

e2c43d2

lfoppiano added 14 commits February 18, 2026 11:51

fix: missing logic to correctly get conflicts and credits in the outp…

3d0da59

…ut TEI

chore: update benchmarks after the bugfix

aa6d260

chore: update header model for both CRF and ChainCRF - pass 1

d125a4f

chore: results for both CRF and ChainCRF - pass 1

22b0de8

chore: update header model for both CRF and ChainCRF - pass 2

0f63699

update header models and evaluation - second pass

2109be3

models: update header BidLSTM_CRF_FEATURES - pass 1

2a02753

chore: update training data for the light models

718c4a7

fix: remove unnecessary classes. Add missing link

a01ae1b

fix: incorrect configuration

1f3e9ff

docs: update benchmarks

2360a0c

Update article-light(-ref) header wapiti models

4aaa59c

Signed-off-by: Luca Foppiano <luca@foppiano.org>

Update name header

6907b62

Signed-off-by: Luca Foppiano <luca@foppiano.org>

lfoppiano mentioned this pull request Feb 19, 2026

"et al." causing sentence issues #1316

Open

lfoppiano added 3 commits February 20, 2026 07:04

models: update article/light and article/light-ref segmentation models

2ec52f2

Signed-off-by: Luca Foppiano <luca@foppiano.org>

models: update article-light BidLSTM models

98bef4d

docs: update evaluation

632262e

lfoppiano added 4 commits February 20, 2026 16:24

models: update header article/light and article/light-ref for the Bid…

5e33a4f

…LSTM_CRF_FEATURES architecture

models: update header article/light and article/light-ref for the Bid…

b33cd72

…LSTM_ChainCRF_FEATURES architecture second pass

feat: add benchmark results for the article/light_ref model

78122b1

chore: update wapiti header model

6ae1c39

Signed-off-by: Luca Foppiano <luca@foppiano.org>

lfoppiano requested a review from Copilot February 21, 2026 14:27

Copilot AI reviewed Feb 21, 2026

View reviewed changes

lfoppiano added 2 commits February 21, 2026 15:34

chore: fix minor mistakes spotted by Copilot

e48e880

Merge branch 'master' into feature/coi-ac

bbe6930

lfoppiano merged commit b91c01d into master Feb 21, 2026
6 of 7 checks passed

lfoppiano deleted the feature/coi-ac branch February 21, 2026 15:58

lfoppiano mentioned this pull request Feb 21, 2026

Feature Request: general back section (section) #698

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extracts Conflict of interests / Author contributions statements#1319

Extracts Conflict of interests / Author contributions statements#1319
lfoppiano merged 96 commits intomasterfrom
feature/coi-ac

lfoppiano commented Jul 30, 2025 •

edited

Loading

Uh oh!

coveralls commented Jul 30, 2025 •

edited

Loading

Uh oh!

lfoppiano commented Feb 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lfoppiano commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lfoppiano commented Feb 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lfoppiano commented Jul 30, 2025 •

edited

Loading

coveralls commented Jul 30, 2025 •

edited

Loading