Add RefChecker logic for reference validation Relates to #13604#15478
Add RefChecker logic for reference validation Relates to #13604#15478NishantDG-SST wants to merge 12 commits intoJabRef:mainfrom
Conversation
Review Summary by QodoAdd RefChecker logic for reference validation
WalkthroughsDescription• Add RefChecker logic for validating bibliographic entries against online sources • Implement entry comparison using DOI, CrossRef, and arXiv fetchers • Add compareEntries method to DuplicateCheck for similarity scoring • Introduce RefValidity enum with REAL, UNSURE, and FAKE classifications Diagramflowchart LR
Entry["BibEntry to validate"]
DOI["DOI Lookup"]
CrossRef["CrossRef Discovery"]
ArXiv["ArXiv Lookup"]
Compare["compareEntries Similarity"]
Result["RefCheckResult with validity"]
Entry --> DOI
Entry --> CrossRef
Entry --> ArXiv
DOI --> Compare
CrossRef --> Compare
ArXiv --> Compare
Compare --> Result
File Changes1. jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java
|
Code Review by Qodo
1.
|
| @Test | ||
| void entriesWithIdenticalTitles() { | ||
| BibEntry one = new BibEntry().withField(StandardField.TITLE, "Reinforcement learning: An introduction"); | ||
| BibEntry two = new BibEntry().withField(StandardField.TITLE, "Reinforcement learning: An introduction"); | ||
|
|
||
| double score = DuplicateCheck.compareEntries(one, two); | ||
|
|
||
| assertTrue(score >= DuplicateCheck.COMPARE_ENTRIES_THRESHOLD); | ||
| } | ||
|
|
||
| @Test | ||
| void entriesWithCompletelyDifferentFields() { | ||
| BibEntry one = new BibEntry() | ||
| .withField(StandardField.TITLE, "Performance on a Signal") | ||
| .withField(StandardField.AUTHOR, "Richard Atkinson"); | ||
| BibEntry two = new BibEntry() | ||
| .withField(StandardField.TITLE, "Rest in Treatment") | ||
| .withField(StandardField.AUTHOR, "Elizabeth Ballard"); | ||
|
|
||
| double score = DuplicateCheck.compareEntries(one, two); | ||
|
|
||
| assertTrue(score < DuplicateCheck.COMPARE_ENTRIES_THRESHOLD); | ||
| } |
There was a problem hiding this comment.
3. Weak threshold asserts in tests 📘 Rule violation ☼ Reliability
New tests use predicate assertions (assertTrue(score >= threshold) / < threshold) instead of asserting exact expected values, weakening regression detection. This violates the unit test requirement to assert exact values/outputs where possible.
Agent Prompt
## Issue description
The added `DuplicateCheck.compareEntries` tests use threshold-based `assertTrue` predicates, which are considered weak checks.
## Issue Context
Update the tests to assert exact expected values (or exact expected structures) to strengthen regression detection.
## Fix Focus Areas
- jablib/src/test/java/org/jabref/logic/database/DuplicateCheckTest.java[665-687]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| @Test | ||
| void entryWithCorrectDoiButWrongMetadataIsNotClassifiedAsReal() { | ||
| BibEntry entry = new BibEntry(StandardEntryType.Article) | ||
| .withField(StandardField.TITLE, "Not a Real Paper") | ||
| .withField(StandardField.AUTHOR, "Random Author") | ||
| .withField(StandardField.YEAR, "2099") | ||
| .withField(StandardField.DOI, "10.48550/arXiv.1706.03762"); | ||
|
|
||
| RefCheckResult result = refChecker.check(entry); | ||
|
|
||
| assertNotEquals(RefValidity.REAL, result.validity()); | ||
| } |
There was a problem hiding this comment.
4. assertnotequals weakens refchecker test 📘 Rule violation ☼ Reliability
The test only asserts the result is "not REAL" via assertNotEquals, which is a weak predicate and can pass for multiple unintended outcomes. The test should assert the exact expected RefValidity (or a complete expected result shape) to meet unit test strength requirements.
Agent Prompt
## Issue description
`RefCheckerTest.entryWithCorrectDoiButWrongMetadataIsNotClassifiedAsReal` uses `assertNotEquals(REAL, ...)`, which is a weak predicate check.
## Issue Context
Change the assertion to an exact expected validity (or assert the full expected `RefCheckResult` properties) so the test fails on near-miss behavior changes.
## Fix Focus Areas
- jablib/src/test/java/org/jabref/logic/refcheck/RefCheckerTest.java[47-58]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
There was a problem hiding this comment.
Pull request overview
This PR introduces initial “RefChecker” logic in jablib to validate bibliographic entries by resolving them via DOI/CrossRef/arXiv and classifying them based on similarity to fetched authoritative metadata.
Changes:
- Added new refcheck domain types (
RefChecker,RefCheckResult,RefValidity) and online-validation flow. - Added
DuplicateCheck.compareEntries(...)plus a shared threshold constant to support similarity-based validation. - Added initial integration-style tests for
RefCheckerand extendedDuplicateCheckTest; updatedCHANGELOG.md.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java |
Implements the online lookup + classification flow (DOI → CrossRef → arXiv) and picks the best result. |
jablib/src/main/java/org/jabref/logic/refcheck/RefCheckResult.java |
Adds a result record carrying validity, optional matched entry, and similarity score. |
jablib/src/main/java/org/jabref/logic/refcheck/RefValidity.java |
Defines the classification enum (REAL/UNSURE/FAKE). |
jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java |
Adds similarity scoring (compareEntries) and a threshold constant used by refcheck. |
jablib/src/test/java/org/jabref/logic/refcheck/RefCheckerTest.java |
Adds initial fetcher-backed tests covering “real”, “not real”, and “nonexistent” cases. |
jablib/src/test/java/org/jabref/logic/database/DuplicateCheckTest.java |
Adds unit tests for compareEntries behavior (self-compare, internal field ignore, etc.). |
CHANGELOG.md |
Documents the addition of RefChecker logic. |
| Optional<BibEntry> found; | ||
| try { | ||
| found = doiFetcher.performSearchById(doi.get().asString()); | ||
| } catch (FetcherException e) { | ||
| LOGGER.warn("DOI lookup failed for {}: {}", doi.get().asString(), e.getMessage()); | ||
| return new RefCheckResult(RefValidity.FAKE, null, 0.0); | ||
| } |
There was a problem hiding this comment.
checkByDoi treats FetcherException (network/remote failure) as RefValidity.FAKE. That makes the overall classification incorrect in offline/error scenarios (a temporary fetch failure can label a real entry as fake). Consider returning UNSURE (and/or carrying an error reason) when lookup fails, and only returning FAKE when a lookup succeeds but similarity is below the fake threshold.
| } catch (FetcherException e) { | ||
| LOGGER.warn("DOI lookup failed for {}: {}", doi.get().asString(), e.getMessage()); | ||
| return new RefCheckResult(RefValidity.FAKE, null, 0.0); |
There was a problem hiding this comment.
Current logging drops the stack trace by only logging e.getMessage(). For operational debugging it’s better to pass the exception as the last argument (e.g., LOGGER.warn(..., e)) so failures can be diagnosed from logs.
| @Test | ||
| void realPaperWithCorrectDoiIsClassifiedAsReal() { | ||
| BibEntry entry = new BibEntry(StandardEntryType.Article) | ||
| .withField(StandardField.TITLE, "Attention Is All You Need") | ||
| .withField(StandardField.AUTHOR, "Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and others") | ||
| .withField(StandardField.YEAR, "2017") | ||
| .withField(StandardField.DOI, "10.48550/arXiv.1706.03762"); | ||
|
|
||
| RefCheckResult result = refChecker.check(entry); | ||
|
|
||
| assertEquals(RefValidity.REAL, result.validity()); | ||
| } | ||
|
|
||
| @Test | ||
| void entryWithCorrectDoiButWrongMetadataIsNotClassifiedAsReal() { | ||
| BibEntry entry = new BibEntry(StandardEntryType.Article) | ||
| .withField(StandardField.TITLE, "Not a Real Paper") | ||
| .withField(StandardField.AUTHOR, "Random Author") | ||
| .withField(StandardField.YEAR, "2099") | ||
| .withField(StandardField.DOI, "10.48550/arXiv.1706.03762"); | ||
|
|
||
| RefCheckResult result = refChecker.check(entry); | ||
|
|
||
| assertNotEquals(RefValidity.REAL, result.validity()); | ||
| } | ||
|
|
||
| @Test | ||
| void entryThatDoesNotExistAnywhereIsClassifiedAsFake() { | ||
| BibEntry entry = new BibEntry(StandardEntryType.Article) | ||
| .withField(StandardField.TITLE, "Nonexistent Paper with no Database") | ||
| .withField(StandardField.AUTHOR, "No Author") | ||
| .withField(StandardField.YEAR, "1800"); | ||
|
|
||
| RefCheckResult result = refChecker.check(entry); | ||
|
|
||
| assertEquals(RefValidity.FAKE, result.validity()); | ||
| } |
There was a problem hiding this comment.
These tests are integration-style and depend on live responses from external services (CrossRef/DOI/arXiv). That makes them prone to flakiness when metadata formatting or search results change (especially the “does not exist anywhere” case, where CrossRef could still return a fuzzy match). Prefer a deterministic unit test by injecting mocked DoiFetcher/ArXivFetcher/CrossRef via the 3-arg RefChecker constructor and asserting on controlled responses.
| String firstValue = one.getField(field).orElse(""); | ||
| String secondValue = two.getField(field).orElse(""); |
There was a problem hiding this comment.
compareEntries uses getField(...) (raw field content) which can penalize harmless formatting differences (LaTeX braces/escaping, whitespace, line breaks) and lead to false FAKE/UNSURE classifications. Since DuplicateCheck already normalizes via getFieldLatexFree(...) in its comparison logic, consider using latex-free/normalized values here as well to keep scoring consistent with the rest of the duplicate-checking implementation.
| String firstValue = one.getField(field).orElse(""); | |
| String secondValue = two.getField(field).orElse(""); | |
| String firstValue = one.getFieldLatexFree(field).orElse(""); | |
| String secondValue = two.getFieldLatexFree(field).orElse(""); |
| /// It does not return early when a DOI or other identifier matches. | ||
| /// This means that an entry with the correct DOI but a wrong author name will still receive a low score. | ||
| /// The old [#isDuplicate] method would have called that a duplicate immediately, which is wrong | ||
| /// for reference checking purposes.it ignores JabRefs internal fields such as the citation key and entry type marker. |
There was a problem hiding this comment.
Spelling/grammar in the method documentation: missing space after the period and missing apostrophe in “JabRefs”. Please fix to improve readability (e.g., “purposes. It ignores JabRef’s …”).
| /// for reference checking purposes.it ignores JabRefs internal fields such as the citation key and entry type marker. | |
| /// for reference checking purposes. It ignores JabRef’s internal fields such as the citation key and entry type marker. |
| return bestOf(doiResult, crossRefResult, arXivResult); | ||
| } |
There was a problem hiding this comment.
If none of the sources yields any candidate (otherEntry == null / score 0.0), bestOf(...) currently returns FAKE (because each lookup returns FAKE on “not found”). That conflates “not found / could not verify” with “verified mismatch” and can mislabel obscure/older but real publications as fake. Consider returning UNSURE when no authoritative candidate was found from any source, and reserving FAKE for the case where a candidate exists but similarity is low.
…ues using .toLowerCase to keep scoring consistent
|
Hi @NishantDG-SST, I tried this locally and the tests pass on my side. From a quick look, the logic now covers DOI lookup, CrossRef-based DOI discovery, and arXiv-based validation, which is a good first step 👍 A few scope / test questions after reading it:
Also, if more realistic samples are needed later, the RefChecker test suite might be a useful source of inspiration for real-world citation patterns? Happy to hear your thoughts on these. |
|
Hey @wanling0000 thanks for testing
Test for arXiv fallback: If a paper cannot be validated via its DOI or CrossRef the checker attempts to resolve it using its arXiv identifiers
|
|
Hi @NishantDG-SST thanks for the detailed explanation and for adding the additional tests, this helps a lot 👍
The fallback plan makes sense to me.
From reading Just wanted to confirm if this matches the intended behavior, mainly so I can align on testing. Maybe it would help to document this a bit more explicitly (e.g. in docs or a small test matrix), so the expected classification is clearer.
Happy for you to continue with your current approach (no need to block on this), I’ll focus on validation/testing on my side :) |
…of UNSURE,FAKE and REAL
Yes you are right that is correct and I apologize for the confusion. I have documented this explicitly in the classify() JavaDoc and added |
|
So, I added 3 new test cases |
|
Thanks for adding these, this looks good to me
I’m fine with either keeping both for clarity or removing one if you prefer to avoid duplication. |
|
My next steps are to
Happy to hear your thoughts and will proceed accordingly. |
|
@NishantDG-SST Thank you for adding a new duplicate checker! The old |
Related issues and pull requests
Refs #13604
PR Description
Only logic for the RefCheck functionality.
Steps to test
Run ./gradlew :jablib:fetcherTest --tests "org.jabref.logic.refcheck.RefCheckerTest"
Checklist
CHANGELOG.mdin a way that can be understood by the average user (if change is visible to the user)