Skip to content

Add RefChecker logic for reference validation Relates to #13604#15478

Open
NishantDG-SST wants to merge 12 commits intoJabRef:mainfrom
NishantDG-SST:13604-refchecker
Open

Add RefChecker logic for reference validation Relates to #13604#15478
NishantDG-SST wants to merge 12 commits intoJabRef:mainfrom
NishantDG-SST:13604-refchecker

Conversation

@NishantDG-SST
Copy link
Copy Markdown
Contributor

@NishantDG-SST NishantDG-SST commented Apr 1, 2026

Related issues and pull requests

Refs #13604

PR Description

Only logic for the RefCheck functionality.

Steps to test

Run ./gradlew :jablib:fetcherTest --tests "org.jabref.logic.refcheck.RefCheckerTest"

Checklist

  • I own the copyright of the code submitted and I license it under the MIT license
  • I manually tested my changes in running JabRef (always required)
  • I added JUnit tests for changes (if applicable)
  • [/] I added screenshots in the PR description (if change is visible to the user)
  • [/] I added a screenshot in the PR description showing a library with a single entry with me as author and as title the issue number
  • I described the change in CHANGELOG.md in a way that can be understood by the average user (if change is visible to the user)
  • [/] I checked the user documentation for up to dateness and submitted a pull request to our user documentation repository

Copilot AI review requested due to automatic review settings April 1, 2026 22:22
@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

Review Summary by Qodo

Add RefChecker logic for reference validation

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Add RefChecker logic for validating bibliographic entries against online sources
• Implement entry comparison using DOI, CrossRef, and arXiv fetchers
• Add compareEntries method to DuplicateCheck for similarity scoring
• Introduce RefValidity enum with REAL, UNSURE, and FAKE classifications
Diagram
flowchart LR
  Entry["BibEntry to validate"]
  DOI["DOI Lookup"]
  CrossRef["CrossRef Discovery"]
  ArXiv["ArXiv Lookup"]
  Compare["compareEntries Similarity"]
  Result["RefCheckResult with validity"]
  
  Entry --> DOI
  Entry --> CrossRef
  Entry --> ArXiv
  DOI --> Compare
  CrossRef --> Compare
  ArXiv --> Compare
  Compare --> Result
Loading

Grey Divider

File Changes

1. jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java ✨ Enhancement +43/-0

Add entry comparison method with similarity scoring

• Add COMPARE_ENTRIES_THRESHOLD constant (0.8) for similarity scoring
• Implement compareEntries static method to compute similarity between entries
• Filter out internal fields from comparison to avoid citation key bias
• Return normalized similarity score in range [0.0, 1.0]

jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java


2. jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java ✨ Enhancement +225/-0

Implement reference validation against online sources

• Create RefChecker class for validating entries against online sources
• Implement three-stage validation: DOI lookup, CrossRef discovery, arXiv lookup
• Add check method that returns best result from all sources
• Implement classification logic using similarity thresholds
• Add error handling with logging for fetcher failures

jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java


3. jablib/src/main/java/org/jabref/logic/refcheck/RefValidity.java ✨ Enhancement +7/-0

Define reference validity classification enum

• Create RefValidity enum with three states: REAL, UNSURE, FAKE
• Represents classification of bibliographic entry validity

jablib/src/main/java/org/jabref/logic/refcheck/RefValidity.java


View more (4)
4. jablib/src/main/java/org/jabref/logic/refcheck/RefCheckResult.java ✨ Enhancement +8/-0

Define reference check result record

• Create RefCheckResult record to encapsulate validation results
• Include validity classification, matched entry, and similarity score

jablib/src/main/java/org/jabref/logic/refcheck/RefCheckResult.java


5. jablib/src/test/java/org/jabref/logic/database/DuplicateCheckTest.java 🧪 Tests +58/-0

Add tests for entry comparison method

• Add five test cases for compareEntries method
• Test scenarios: no shared fields, identical entries, citation key ignoring, identical titles,
 different fields
• Verify similarity scoring and threshold behavior

jablib/src/test/java/org/jabref/logic/database/DuplicateCheckTest.java


6. jablib/src/test/java/org/jabref/logic/refcheck/RefCheckerTest.java 🧪 Tests +71/-0

Add integration tests for RefChecker

• Create RefCheckerTest with three integration test cases
• Test real paper validation with correct DOI
• Test rejection of entries with correct DOI but wrong metadata
• Test classification of nonexistent entries as fake

jablib/src/test/java/org/jabref/logic/refcheck/RefCheckerTest.java


7. CHANGELOG.md 📝 Documentation +1/-1

Document RefChecker feature in changelog

• Add entry documenting RefChecker logic addition
• Reference issue #13604
• Describe validation against DOI, CrossRef, and arXiv sources

CHANGELOG.md


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects bot commented Apr 1, 2026

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (3) 📎 Requirement gaps (0) 🎨 UX Issues (0)

Grey Divider


Action required

1. LOGGER.warn drops stack traces📘 Rule violation ≡ Correctness
Description
The new exception logging uses e.getMessage() and does not pass the Throwable to the logger,
losing stack traces and structured exception data. This makes diagnosing fetch failures
significantly harder and violates the project’s exception logging requirement.
Code

jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[R100-165]

+        } catch (FetcherException e) {
+            LOGGER.warn("DOI lookup failed for {}: {}", doi.get().asString(), e.getMessage());
+            return new RefCheckResult(RefValidity.FAKE, null, 0.0);
+        }
+
+        if (found.isEmpty()) {
+            return new RefCheckResult(RefValidity.FAKE, null, 0.0);
+        }
+
+        return classify(entry, found.get());
+    }
+
+    /// Tries to validate the entry by first discovering a DOI through CrossRef,
+    /// then fetching the authoritative entry using that DOI.
+    /// Returns a FAKE result with no matched entry if CrossRef finds no DOI
+    private RefCheckResult checkByCrossRef(BibEntry entry) {
+        Optional<DOI> foundDoi;
+        try {
+            foundDoi = crossRef.findIdentifier(entry);
+        } catch (FetcherException e) {
+            LOGGER.warn("CrossRef lookup failed: {}", e.getMessage());
+            return new RefCheckResult(RefValidity.FAKE, null, 0.0);
+        }
+
+        if (foundDoi.isEmpty()) {
+            return new RefCheckResult(RefValidity.FAKE, null, 0.0);
+        }
+
+        Optional<BibEntry> found;
+        try {
+            found = doiFetcher.performSearchById(foundDoi.get().asString());
+        } catch (FetcherException e) {
+            LOGGER.warn("DOI fetch after CrossRef discovery failed for {}: {}", foundDoi.get().asString(), e.getMessage());
+            return new RefCheckResult(RefValidity.FAKE, null, 0.0);
+        }
+
+        if (found.isEmpty()) {
+            return new RefCheckResult(RefValidity.FAKE, null, 0.0);
+        }
+
+        return classify(entry, found.get());
+    }
+
+    /// Tries to validate the entry using arXiv.
+    /// First looks for an arXiv identifier, then fetches the full entry by that identifier.
+    /// Returns a FAKE result with no matched entry if no arXiv identifier is found
+    private RefCheckResult checkByArXiv(BibEntry entry) {
+        Optional<ArXivIdentifier> identifier;
+        try {
+            identifier = arXivFetcher.findIdentifier(entry);
+        } catch (FetcherException e) {
+            LOGGER.warn("arXiv identifier lookup failed: {}", e.getMessage());
+            return new RefCheckResult(RefValidity.FAKE, null, 0.0);
+        }
+
+        if (identifier.isEmpty()) {
+            return new RefCheckResult(RefValidity.FAKE, null, 0.0);
+        }
+
+        Optional<BibEntry> found;
+        try {
+            found = arXivFetcher.performSearchById(identifier.get().asString());
+        } catch (FetcherException e) {
+            LOGGER.warn("arXiv fetch failed for {}: {}", identifier.get().asString(), e.getMessage());
+            return new RefCheckResult(RefValidity.FAKE, null, 0.0);
+        }
Evidence
PR Compliance ID 16 requires logging the exception object as the last argument; the added
LOGGER.warn(...) calls only log e.getMessage() and never pass e, so stack traces are not
recorded.

AGENTS.md
jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[100-165]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`RefChecker` catches `FetcherException` but logs only `e.getMessage()` instead of logging the actual `Throwable`, which drops stack traces.
## Issue Context
Per logging rules, exceptions must be passed to the logger as the last argument (e.g., `LOGGER.warn("...", value, e)`).
## Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[100-165]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. compareEntries uses orElse("")📘 Rule violation ⚙ Maintainability
Description
DuplicateCheck.compareEntries uses Optional.orElse(""), which is explicitly discouraged and can
hide missing-value behavior. Since commonFields already filters for fields present in both
entries, the orElse defaults are unnecessary and non-idiomatic.
Code

jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java[R366-382]

+        List<Field> commonFields = one.getFields().stream()
+                                      .filter(field -> !(field instanceof InternalField))
+                                      .filter(field -> two.getField(field).isPresent())
+                                      .toList();
+
+        if (commonFields.isEmpty()) {
+            return 0.0;
+        }
+
+        return commonFields.stream()
+                           .mapToDouble(field -> {
+                               String firstValue = one.getField(field).orElse("");
+                               String secondValue = two.getField(field).orElse("");
+                               return stringSimilarity.similarity(firstValue, secondValue);
+                           })
+                           .average()
+                           .orElse(0.0);
Evidence
PR Compliance ID 10 forbids the Optional.orElse("") anti-pattern; the new code adds two
orElse("") defaults when reading fields for similarity scoring.

AGENTS.md
jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java[366-382]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`DuplicateCheck.compareEntries` uses `Optional.orElse("")`, which is disallowed and unnecessary here because `commonFields` already ensures both entries contain the field.
## Issue Context
Replace the `orElse("")` calls with direct retrieval that reflects the guaranteed presence (or restructure using `Optional` APIs) to comply with Optional handling conventions.
## Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java[366-382]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Weak threshold asserts in tests 📘 Rule violation ☼ Reliability
Description
New tests use predicate assertions (assertTrue(score >= threshold) / < threshold) instead of
asserting exact expected values, weakening regression detection. This violates the unit test
requirement to assert exact values/outputs where possible.
Code

jablib/src/test/java/org/jabref/logic/database/DuplicateCheckTest.java[R665-687]

+    @Test
+    void entriesWithIdenticalTitles() {
+        BibEntry one = new BibEntry().withField(StandardField.TITLE, "Reinforcement learning: An introduction");
+        BibEntry two = new BibEntry().withField(StandardField.TITLE, "Reinforcement learning: An introduction");
+
+        double score = DuplicateCheck.compareEntries(one, two);
+
+        assertTrue(score >= DuplicateCheck.COMPARE_ENTRIES_THRESHOLD);
+    }
+
+    @Test
+    void entriesWithCompletelyDifferentFields() {
+        BibEntry one = new BibEntry()
+                .withField(StandardField.TITLE, "Performance on a Signal")
+                .withField(StandardField.AUTHOR, "Richard Atkinson");
+        BibEntry two = new BibEntry()
+                .withField(StandardField.TITLE, "Rest in Treatment")
+                .withField(StandardField.AUTHOR, "Elizabeth Ballard");
+
+        double score = DuplicateCheck.compareEntries(one, two);
+
+        assertTrue(score < DuplicateCheck.COMPARE_ENTRIES_THRESHOLD);
+    }
Evidence
PR Compliance ID 29 requires tests to assert exact expected values and avoid weak predicate checks;
the added tests compare with assertTrue(...) rather than assertEquals(...) to expected numeric
results (or another exact structured expectation).

jablib/src/test/java/org/jabref/logic/database/DuplicateCheckTest.java[665-687]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The added `DuplicateCheck.compareEntries` tests use threshold-based `assertTrue` predicates, which are considered weak checks.
## Issue Context
Update the tests to assert exact expected values (or exact expected structures) to strengthen regression detection.
## Fix Focus Areas
- jablib/src/test/java/org/jabref/logic/database/DuplicateCheckTest.java[665-687]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (4)
4. assertNotEquals weakens RefChecker test 📘 Rule violation ☼ Reliability
Description
The test only asserts the result is "not REAL" via assertNotEquals, which is a weak predicate and
can pass for multiple unintended outcomes. The test should assert the exact expected RefValidity
(or a complete expected result shape) to meet unit test strength requirements.
Code

jablib/src/test/java/org/jabref/logic/refcheck/RefCheckerTest.java[R47-58]

+    @Test
+    void entryWithCorrectDoiButWrongMetadataIsNotClassifiedAsReal() {
+        BibEntry entry = new BibEntry(StandardEntryType.Article)
+                .withField(StandardField.TITLE, "Not a Real Paper")
+                .withField(StandardField.AUTHOR, "Random Author")
+                .withField(StandardField.YEAR, "2099")
+                .withField(StandardField.DOI, "10.48550/arXiv.1706.03762");
+
+        RefCheckResult result = refChecker.check(entry);
+
+        assertNotEquals(RefValidity.REAL, result.validity());
+    }
Evidence
PR Compliance ID 29 requires exact expected-value assertions and discourages weak predicates;
assertNotEquals(RefValidity.REAL, ...) does not assert the intended classification outcome.

jablib/src/test/java/org/jabref/logic/refcheck/RefCheckerTest.java[47-58]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`RefCheckerTest.entryWithCorrectDoiButWrongMetadataIsNotClassifiedAsReal` uses `assertNotEquals(REAL, ...)`, which is a weak predicate check.
## Issue Context
Change the assertion to an exact expected validity (or assert the full expected `RefCheckResult` properties) so the test fails on near-miss behavior changes.
## Fix Focus Areas
- jablib/src/test/java/org/jabref/logic/refcheck/RefCheckerTest.java[47-58]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


5. AI/LLM comment in code📘 Rule violation ⚙ Maintainability
Description
A new source-code comment references LLM usage (No LLM is used.), which is meta-process/AI
disclosure content that should not be embedded in code. This violates the rule prohibiting
AI-disclosure comments inside source files.
Code

jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[R30-31]

+/// No LLM is used. Classification is based entirely on
+/// [org.jabref.logic.database.DuplicateCheck#compareEntries].
Evidence
PR Compliance ID 26 prohibits AI/LLM disclosure comments inside source code; the added class
Javadoc-style comment explicitly mentions LLM usage.

AGENTS.md
jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[30-31]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The `RefChecker` class comment includes AI/LLM disclosure text (`No LLM is used.`), which is not allowed in source code comments.
## Issue Context
Remove or rephrase the comment to describe behavior without referencing AI/LLM usage.
## Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[30-31]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


6. Unnormalized field similarity🐞 Bug ≡ Correctness
Description
DuplicateCheck.compareEntries compares raw field values via StringSimilarity.similarity, bypassing
JabRef’s existing LaTeX-free and field-specific normalization (e.g., author/page handling), which
can lower similarity for semantically identical entries and misclassify RefChecker results.
Code

jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java[R363-382]

+    public static double compareEntries(BibEntry one, BibEntry two) {
+        StringSimilarity stringSimilarity = new StringSimilarity();
+
+        List<Field> commonFields = one.getFields().stream()
+                                      .filter(field -> !(field instanceof InternalField))
+                                      .filter(field -> two.getField(field).isPresent())
+                                      .toList();
+
+        if (commonFields.isEmpty()) {
+            return 0.0;
+        }
+
+        return commonFields.stream()
+                           .mapToDouble(field -> {
+                               String firstValue = one.getField(field).orElse("");
+                               String secondValue = two.getField(field).orElse("");
+                               return stringSimilarity.similarity(firstValue, secondValue);
+                           })
+                           .average()
+                           .orElse(0.0);
Evidence
The new compareEntries uses BibEntry.getField (raw stored strings) and Levenshtein-based similarity
on those raw values. Existing duplicate logic intentionally uses getFieldLatexFree plus special
handling for PERSON_NAMES/PAGES/JOURNAL/CHAPTER, so the new scoring will diverge and can under-score
entries that differ only in formatting/LaTeX/author formatting.

jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java[345-383]
jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java[163-210]
jablib/src/main/java/org/jabref/logic/util/strings/StringSimilarity.java[30-53]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`DuplicateCheck.compareEntries` currently computes similarity using raw field strings (`getField`) and generic Levenshtein similarity. This bypasses JabRef’s existing LaTeX-free extraction and field-specific comparison rules (authors/pages/journal, etc.), leading to systematically wrong similarity scores and therefore wrong `RefChecker` classifications.
### Issue Context
Existing duplicate detection already contains proven normalization and comparison logic in `compareSingleField` (LaTeX-free, PERSON_NAMES handling, page normalization, etc.). `compareEntries` should reuse these normalization steps (at least) so that formatting differences don’t dominate the score.
### Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java[345-383]
- jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java[163-210]
### Implementation notes
- Use `getFieldLatexFree` instead of `getField` (or normalize via the same helper methods used in `compareSingleField`).
- Consider applying field-specific preprocessing: PERSON_NAMES -> correlateByWords-like handling; pages/journal normalization similar to existing methods.
- Keep internal fields excluded as today.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


7. Network errors labeled FAKE🐞 Bug ☼ Reliability
Description
RefChecker catches FetcherException (network/server/parse failures) and returns RefValidity.FAKE,
incorrectly classifying “could not validate due to error” as “reference is fake”.
Code

jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[R98-103]

+        try {
+            found = doiFetcher.performSearchById(doi.get().asString());
+        } catch (FetcherException e) {
+            LOGGER.warn("DOI lookup failed for {}: {}", doi.get().asString(), e.getMessage());
+            return new RefCheckResult(RefValidity.FAKE, null, 0.0);
+        }
Evidence
FetcherException is thrown by fetcher infrastructure specifically for I/O/network and parser
failures. RefChecker converts these exceptions into FAKE results with score 0.0, which is
semantically indistinguishable from a real negative validation outcome and can produce false
negatives when services are temporarily unavailable.

jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[97-103]
jablib/src/main/java/org/jabref/logic/importer/EntryBasedParserFetcher.java[31-55]
jablib/src/main/java/org/jabref/logic/importer/IdParserFetcher.java[43-72]
jablib/src/main/java/org/jabref/logic/importer/FetcherException.java[16-91]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`RefChecker` currently maps any `FetcherException` (network/server/parse errors) to `RefValidity.FAKE`. This produces false negatives: entries can be marked FAKE due to transient outages or rate limiting.
### Issue Context
Fetcher infrastructure throws `FetcherException` for network and parsing failures. These are not evidence the reference is fake; they indicate the check is inconclusive.
### Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[89-172]
- jablib/src/main/java/org/jabref/logic/refcheck/RefValidity.java[1-7]
### Implementation notes
- Introduce an explicit state (e.g., `UNKNOWN`/`ERROR`) in `RefValidity`, or return `UNSURE` with a dedicated flag/error payload.
- Ensure `bestOf` ranking accounts for the new state.
- Update/add tests to cover error paths (e.g., mock fetchers throwing `FetcherException`).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

8. Trivial threshold inline comment 📘 Rule violation ⚙ Maintainability
Description
The new inline comment on COMPARE_ENTRIES_THRESHOLD restates what the constant name already
expresses and does not explain additional rationale. This adds noise and violates the requirement
that comments explain "why" rather than "what".
Code

jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java[38]

+    public static final double COMPARE_ENTRIES_THRESHOLD = 0.8; // The threshold that determines if entries are likely to be of the same publication
Evidence
PR Compliance ID 7 disallows trivial comments that merely restate code; the added inline comment
repeats the meaning of COMPARE_ENTRIES_THRESHOLD without providing extra intent/rationale.

AGENTS.md
jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java[38-38]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The inline comment on `COMPARE_ENTRIES_THRESHOLD` is trivial and restates the identifier.
## Issue Context
Either remove the comment or replace it with a short rationale explaining *why* the specific value (`0.8`) was chosen.
## Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java[38-38]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


9. Changelog entry too technical📘 Rule violation ⚙ Maintainability
Description
The new CHANGELOG entry is programmer-focused (RefChecker logic) and lacks the end-user
framing/style expected for release notes. This reduces clarity for average users and may not meet
changelog quality requirements.
Code

CHANGELOG.md[13]

+- We added RefChecker logic to validate entries against DOI, CrossRef, and arXiv sources [#13604](https://github.com/JabRef/jabref/issues/13604)
Evidence
PR Compliance ID 25 requires end-user understandable changelog entries, and PR Compliance ID 32
requires professional, precise wording; the added entry uses internal implementation naming rather
than user-facing wording.

AGENTS.md
CHANGELOG.md[13-13]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The new changelog bullet references internal implementation details ("RefChecker logic") instead of describing the user-visible feature.
## Issue Context
Rewrite the entry to describe what users can do/what improved, using consistent changelog style.
## Fix Focus Areas
- CHANGELOG.md[13-13]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


10. Tie loses matched entry 🐞 Bug ≡ Correctness
Description
RefChecker.bestOf breaks ties only by validity rank and similarityScore, so when those are equal it
can keep a result with otherEntry=null even if another attempt found an authoritative entry,
dropping useful diagnostics.
Code

jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[R195-209]

+    private static RefCheckResult bestOf(RefCheckResult doiResult, RefCheckResult crossRefResult,
+                                         RefCheckResult arXivResult) {
+        RefCheckResult best = doiResult;
+
+        if (rank(crossRefResult) > rank(best)
+                || (rank(crossRefResult) == rank(best)
+                && crossRefResult.similarityScore() > best.similarityScore())) {
+            best = crossRefResult;
+        }
+
+        if (rank(arXivResult) > rank(best)
+                || (rank(arXivResult) == rank(best)
+                && arXivResult.similarityScore() > best.similarityScore())) {
+            best = arXivResult;
+        }
Evidence
bestOf only replaces the current best if rank is greater or similarityScore is strictly greater.
Since some paths return FAKE with otherEntry=null (e.g., missing DOI/identifier) and others can
return FAKE/UNSURE with otherEntry present, an equal-rank/equal-score tie will retain the earlier
result and may discard a found candidate entry.

jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[189-212]
jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[89-107]
jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[174-187]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`bestOf` ignores whether `otherEntry` is present when breaking ties. This can return a result with `otherEntry == null` even though another attempt found a candidate authoritative entry.
### Issue Context
The UI/consumer likely benefits from seeing the closest authoritative entry even when validity is FAKE/UNSURE. The tie-breaker should preserve richer results.
### Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[189-212]
### Implementation notes
- When `rank(a) == rank(b)` and `similarityScore` is equal, prefer the result with non-null `otherEntry`.
- Optionally compute `rank(...)` once per result to simplify logic.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

Comment on lines +665 to +687
@Test
void entriesWithIdenticalTitles() {
BibEntry one = new BibEntry().withField(StandardField.TITLE, "Reinforcement learning: An introduction");
BibEntry two = new BibEntry().withField(StandardField.TITLE, "Reinforcement learning: An introduction");

double score = DuplicateCheck.compareEntries(one, two);

assertTrue(score >= DuplicateCheck.COMPARE_ENTRIES_THRESHOLD);
}

@Test
void entriesWithCompletelyDifferentFields() {
BibEntry one = new BibEntry()
.withField(StandardField.TITLE, "Performance on a Signal")
.withField(StandardField.AUTHOR, "Richard Atkinson");
BibEntry two = new BibEntry()
.withField(StandardField.TITLE, "Rest in Treatment")
.withField(StandardField.AUTHOR, "Elizabeth Ballard");

double score = DuplicateCheck.compareEntries(one, two);

assertTrue(score < DuplicateCheck.COMPARE_ENTRIES_THRESHOLD);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

3. Weak threshold asserts in tests 📘 Rule violation ☼ Reliability

New tests use predicate assertions (assertTrue(score >= threshold) / < threshold) instead of
asserting exact expected values, weakening regression detection. This violates the unit test
requirement to assert exact values/outputs where possible.
Agent Prompt
## Issue description
The added `DuplicateCheck.compareEntries` tests use threshold-based `assertTrue` predicates, which are considered weak checks.

## Issue Context
Update the tests to assert exact expected values (or exact expected structures) to strengthen regression detection.

## Fix Focus Areas
- jablib/src/test/java/org/jabref/logic/database/DuplicateCheckTest.java[665-687]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +47 to +58
@Test
void entryWithCorrectDoiButWrongMetadataIsNotClassifiedAsReal() {
BibEntry entry = new BibEntry(StandardEntryType.Article)
.withField(StandardField.TITLE, "Not a Real Paper")
.withField(StandardField.AUTHOR, "Random Author")
.withField(StandardField.YEAR, "2099")
.withField(StandardField.DOI, "10.48550/arXiv.1706.03762");

RefCheckResult result = refChecker.check(entry);

assertNotEquals(RefValidity.REAL, result.validity());
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

4. assertnotequals weakens refchecker test 📘 Rule violation ☼ Reliability

The test only asserts the result is "not REAL" via assertNotEquals, which is a weak predicate and
can pass for multiple unintended outcomes. The test should assert the exact expected RefValidity
(or a complete expected result shape) to meet unit test strength requirements.
Agent Prompt
## Issue description
`RefCheckerTest.entryWithCorrectDoiButWrongMetadataIsNotClassifiedAsReal` uses `assertNotEquals(REAL, ...)`, which is a weak predicate check.

## Issue Context
Change the assertion to an exact expected validity (or assert the full expected `RefCheckResult` properties) so the test fails on near-miss behavior changes.

## Fix Focus Areas
- jablib/src/test/java/org/jabref/logic/refcheck/RefCheckerTest.java[47-58]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces initial “RefChecker” logic in jablib to validate bibliographic entries by resolving them via DOI/CrossRef/arXiv and classifying them based on similarity to fetched authoritative metadata.

Changes:

  • Added new refcheck domain types (RefChecker, RefCheckResult, RefValidity) and online-validation flow.
  • Added DuplicateCheck.compareEntries(...) plus a shared threshold constant to support similarity-based validation.
  • Added initial integration-style tests for RefChecker and extended DuplicateCheckTest; updated CHANGELOG.md.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java Implements the online lookup + classification flow (DOI → CrossRef → arXiv) and picks the best result.
jablib/src/main/java/org/jabref/logic/refcheck/RefCheckResult.java Adds a result record carrying validity, optional matched entry, and similarity score.
jablib/src/main/java/org/jabref/logic/refcheck/RefValidity.java Defines the classification enum (REAL/UNSURE/FAKE).
jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java Adds similarity scoring (compareEntries) and a threshold constant used by refcheck.
jablib/src/test/java/org/jabref/logic/refcheck/RefCheckerTest.java Adds initial fetcher-backed tests covering “real”, “not real”, and “nonexistent” cases.
jablib/src/test/java/org/jabref/logic/database/DuplicateCheckTest.java Adds unit tests for compareEntries behavior (self-compare, internal field ignore, etc.).
CHANGELOG.md Documents the addition of RefChecker logic.

Comment on lines +97 to +103
Optional<BibEntry> found;
try {
found = doiFetcher.performSearchById(doi.get().asString());
} catch (FetcherException e) {
LOGGER.warn("DOI lookup failed for {}: {}", doi.get().asString(), e.getMessage());
return new RefCheckResult(RefValidity.FAKE, null, 0.0);
}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkByDoi treats FetcherException (network/remote failure) as RefValidity.FAKE. That makes the overall classification incorrect in offline/error scenarios (a temporary fetch failure can label a real entry as fake). Consider returning UNSURE (and/or carrying an error reason) when lookup fails, and only returning FAKE when a lookup succeeds but similarity is below the fake threshold.

Copilot uses AI. Check for mistakes.
Comment on lines +100 to +102
} catch (FetcherException e) {
LOGGER.warn("DOI lookup failed for {}: {}", doi.get().asString(), e.getMessage());
return new RefCheckResult(RefValidity.FAKE, null, 0.0);
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current logging drops the stack trace by only logging e.getMessage(). For operational debugging it’s better to pass the exception as the last argument (e.g., LOGGER.warn(..., e)) so failures can be diagnosed from logs.

Copilot uses AI. Check for mistakes.
Comment on lines +34 to +70
@Test
void realPaperWithCorrectDoiIsClassifiedAsReal() {
BibEntry entry = new BibEntry(StandardEntryType.Article)
.withField(StandardField.TITLE, "Attention Is All You Need")
.withField(StandardField.AUTHOR, "Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and others")
.withField(StandardField.YEAR, "2017")
.withField(StandardField.DOI, "10.48550/arXiv.1706.03762");

RefCheckResult result = refChecker.check(entry);

assertEquals(RefValidity.REAL, result.validity());
}

@Test
void entryWithCorrectDoiButWrongMetadataIsNotClassifiedAsReal() {
BibEntry entry = new BibEntry(StandardEntryType.Article)
.withField(StandardField.TITLE, "Not a Real Paper")
.withField(StandardField.AUTHOR, "Random Author")
.withField(StandardField.YEAR, "2099")
.withField(StandardField.DOI, "10.48550/arXiv.1706.03762");

RefCheckResult result = refChecker.check(entry);

assertNotEquals(RefValidity.REAL, result.validity());
}

@Test
void entryThatDoesNotExistAnywhereIsClassifiedAsFake() {
BibEntry entry = new BibEntry(StandardEntryType.Article)
.withField(StandardField.TITLE, "Nonexistent Paper with no Database")
.withField(StandardField.AUTHOR, "No Author")
.withField(StandardField.YEAR, "1800");

RefCheckResult result = refChecker.check(entry);

assertEquals(RefValidity.FAKE, result.validity());
}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are integration-style and depend on live responses from external services (CrossRef/DOI/arXiv). That makes them prone to flakiness when metadata formatting or search results change (especially the “does not exist anywhere” case, where CrossRef could still return a fuzzy match). Prefer a deterministic unit test by injecting mocked DoiFetcher/ArXivFetcher/CrossRef via the 3-arg RefChecker constructor and asserting on controlled responses.

Copilot uses AI. Check for mistakes.
Comment on lines +377 to +378
String firstValue = one.getField(field).orElse("");
String secondValue = two.getField(field).orElse("");
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compareEntries uses getField(...) (raw field content) which can penalize harmless formatting differences (LaTeX braces/escaping, whitespace, line breaks) and lead to false FAKE/UNSURE classifications. Since DuplicateCheck already normalizes via getFieldLatexFree(...) in its comparison logic, consider using latex-free/normalized values here as well to keep scoring consistent with the rest of the duplicate-checking implementation.

Suggested change
String firstValue = one.getField(field).orElse("");
String secondValue = two.getField(field).orElse("");
String firstValue = one.getFieldLatexFree(field).orElse("");
String secondValue = two.getFieldLatexFree(field).orElse("");

Copilot uses AI. Check for mistakes.
/// It does not return early when a DOI or other identifier matches.
/// This means that an entry with the correct DOI but a wrong author name will still receive a low score.
/// The old [#isDuplicate] method would have called that a duplicate immediately, which is wrong
/// for reference checking purposes.it ignores JabRefs internal fields such as the citation key and entry type marker.
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling/grammar in the method documentation: missing space after the period and missing apostrophe in “JabRefs”. Please fix to improve readability (e.g., “purposes. It ignores JabRef’s …”).

Suggested change
/// for reference checking purposes.it ignores JabRefs internal fields such as the citation key and entry type marker.
/// for reference checking purposes. It ignores JabRef’s internal fields such as the citation key and entry type marker.

Copilot uses AI. Check for mistakes.
Comment on lines +86 to +87
return bestOf(doiResult, crossRefResult, arXivResult);
}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If none of the sources yields any candidate (otherEntry == null / score 0.0), bestOf(...) currently returns FAKE (because each lookup returns FAKE on “not found”). That conflates “not found / could not verify” with “verified mismatch” and can mislabel obscure/older but real publications as fake. Consider returning UNSURE when no authoritative candidate was found from any source, and reserving FAKE for the case where a candidate exists but similarity is low.

Copilot uses AI. Check for mistakes.
@github-actions github-actions bot added status: changes-required Pull requests that are not yet complete status: no-bot-comments and removed status: changes-required Pull requests that are not yet complete labels Apr 1, 2026
@wanling0000
Copy link
Copy Markdown
Collaborator

Hi @NishantDG-SST, I tried this locally and the tests pass on my side.

From a quick look, the logic now covers DOI lookup, CrossRef-based DOI discovery, and arXiv-based validation, which is a good first step 👍

A few scope / test questions after reading it:

  • the original issue description also mentions a search-based fallback via CompositeSearchBasedFetcher, is that planned for later?
  • the current implementation already introduces UNSURE, but the current tests do not seem to exercise this explicitly yet. Worth adding a small test?
  • In the second test, the assertion assertNotEquals(REAL, ...) seems quite broad. Since the implementation distinguishes between UNSURE and FAKE, it might be helpful to assert the expected classification more explicitly.
  • The current tests mainly cover the happy path and a basic negative case. Would it make sense to add a few more small cases to explicitly exercise different branches (e.g. arXiv fallback, CrossRef path)?

Also, if more realistic samples are needed later, the RefChecker test suite might be a useful source of inspiration for real-world citation patterns?

Happy to hear your thoughts on these.

@NishantDG-SST
Copy link
Copy Markdown
Contributor Author

Hey @wanling0000 thanks for testing

  1. So, CompositeSearchBasedFetcher is planned as a follow-up.
    It requires ImporterPreferences to know which catalogs the user has
    selected, which belongs in the GUI/CLI layer rather than the logic layer.

  2. UNSURE test I have added one but, a reliable UNSURE test is hard to write against a live API because I cannot control the exact score. The entryWithSlightlyWrongTitleIsClassifiedAsUnsureOrReal() asserts notFAKE.
    There are two ways RefChecker can return UNSURE:
    a. Network Error: The fetcher throws an exception (not 404).
    b. Metadata Mismatch: The API finds the paper but the similarity score falls between 0.5 and 0.8.

  3. assertNotEquals, the reason I used assertNotEquals(REAL) is that the
    result could be either UNSURE or FAKE depending on the similarity score,
    which is determined by the string similarity algorithm at runtime. I cannot
    predict which one it will be without hardcoding implementation details.

    But I can assert it FAKE specifically the title and author are so
    wrong that the score falls below 0.5. If you agree with that implementation then I'll proceed.

  4. I have added

Test for arXiv fallback: If a paper cannot be validated via its DOI or CrossRef the checker attempts to resolve it using its arXiv identifiers
Test for CrossRef path: an entry with no DOI and no arXiv ID but enough metadata for CrossRef to find it.

  1. Also I'm going through the RefChecker test suite I'll update if I find any inspiration for testing

@wanling0000
Copy link
Copy Markdown
Collaborator

Hi @NishantDG-SST thanks for the detailed explanation and for adding the additional tests, this helps a lot 👍

  1. So, CompositeSearchBasedFetcher is planned as a follow-up.
    It requires ImporterPreferences to know which catalogs the user has
    selected, which belongs in the GUI/CLI layer rather than the logic layer.

The fallback plan makes sense to me.

  1. UNSURE test I have added one but, a reliable UNSURE test is hard to write against a live API because I cannot control the exact score. The entryWithSlightlyWrongTitleIsClassifiedAsUnsureOrReal() asserts notFAKE.
    There are two ways RefChecker can return UNSURE:
    a. Network Error: The fetcher throws an exception (not 404).
    b. Metadata Mismatch: The API finds the paper but the similarity score falls between 0.5 and 0.8.

From reading RefChecker, my understanding is that UNSURE only occurs when an authoritative entry was found, while cases like “no result” or fetch failures are treated as FAKE.

Just wanted to confirm if this matches the intended behavior, mainly so I can align on testing.

Maybe it would help to document this a bit more explicitly (e.g. in docs or a small test matrix), so the expected classification is clearer.

  1. assertNotEquals, the reason I used assertNotEquals(REAL) is that the
    result could be either UNSURE or FAKE depending on the similarity score,
    which is determined by the string similarity algorithm at runtime. I cannot
    predict which one it will be without hardcoding implementation details.
    But I can assert it FAKE specifically the title and author are so
    wrong that the score falls below 0.5. If you agree with that implementation then I'll proceed.

Happy for you to continue with your current approach (no need to block on this), I’ll focus on validation/testing on my side :)

@github-actions github-actions bot added status: changes-required Pull requests that are not yet complete and removed status: no-bot-comments labels Apr 4, 2026
@github-actions github-actions bot added status: no-bot-comments and removed status: changes-required Pull requests that are not yet complete labels Apr 4, 2026
@NishantDG-SST
Copy link
Copy Markdown
Contributor Author

From reading RefChecker, my understanding is that UNSURE only occurs when an authoritative entry was found, while cases like “no result” or fetch failures are treated as FAKE.

Just wanted to confirm if this matches the intended behavior, mainly so I can align on testing.

Yes you are right that is correct and I apologize for the confusion.
Network errors will always return FAKE.
UNSURE is returned in exactly one case: an authoritative entry was found
and the similarity score falls in [0.5, 0.8]. Network errors and not-found
cases always return FAKE with null matchedEntry and score 0.0.

I have documented this explicitly in the classify() JavaDoc and added
a note to check() clarifying that fetch failures always produce FAKE.
The matchedEntry field distinguishes the two FAKE cases null means
nothing was found, non-null means a candidate was found but similarity < 0.5

@NishantDG-SST
Copy link
Copy Markdown
Contributor Author

NishantDG-SST commented Apr 4, 2026

So, I added 3 new test cases
entryWithCompletelyWrongAuthorIsNotClassifiedAsReal() (checks real title and real DOI with completely wrong authors)
entryWhoseDOIResolvesToDifferentPaperIsNotClassifiedAsReal() (this does the same as entryWithCorrectDoiButWrongMetadataIsNotClassifiedAsReal() , So if you feel like one of them is not needed i can remove one)
bertPaperWithCorrectDoiIsClassifiedAsReal() (a second real paper test using a different arXiv DOI)

@wanling0000
Copy link
Copy Markdown
Collaborator

Thanks for adding these, this looks good to me

entryWhoseDOIResolvesToDifferentPaperIsNotClassifiedAsReal() (this does the same as entryWithCorrectDoiButWrongMetadataIsNotClassifiedAsReal() , So if you feel like one of them is not needed i can remove one)

I’m fine with either keeping both for clarity or removing one if you prefer to avoid duplication.

@NishantDG-SST
Copy link
Copy Markdown
Contributor Author

My next steps are to

  • Add method to RefChecker to process a whole .bib file
  • Create the refcheck group tree (real paper / unsure / fake paper) using ExplicitGroup and GroupTreeNode like addSuggestedGroups in GroupTreeViewModel.java

Happy to hear your thoughts and will proceed accordingly.

@pluto-han
Copy link
Copy Markdown
Contributor

pluto-han commented Apr 5, 2026

@NishantDG-SST Thank you for adding a new duplicate checker! The old isDuplicate method does not work well in #15316, compareEntries looks greate!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants