feat(verify-refs): v1.2.0 — full-author cross-check + PubMed efetch authoritative#18
Closed
Yoojin-nam wants to merge 1 commit into
Closed
feat(verify-refs): v1.2.0 — full-author cross-check + PubMed efetch authoritative#18Yoojin-nam wants to merge 1 commit into
Yoojin-nam wants to merge 1 commit into
Conversation
…uthoritative verify-refs v1.1.x only checked the first author family name; #2..#N family hallucinations passed audit (Paper 1 npj DM 2026-05-11 incident: liu2026benchmarking registered with 10 names but 7/10 first names were AI-fabricated — Yishu/Zifeng Ingram/Xue/Linhao/Samiran/Tobias/Zhijian — all hallucinated and would have shipped to reviewers). verify-refs/scripts/verify_refs.py: - RefRecord extended with cited_authors[], actual_authors[], cited_author_count, actual_author_count, audit_truncated. Backward-compatible (first_author_guess populated from cited_authors[0] when available). - parse_bib_authors() new — splits BibTeX `author` field by ' and ', strips LaTeX accents and braces. parse_bib() now uses balanced-brace parsing so entries like Park2025korean (author contains `Monta\~{n}\`{a}` LaTeX escape) extract all 4 names instead of stopping at the first inner `}`. - verify_pubmed_efetch() new — authoritative XML full-record source. Reads <LastName> + <ForeName> per <Author ValidYN="Y"> element. Used in preference to CrossRef when PMID present (Paper 1 Aydin 2024 Vlachos incident: CrossRef returned given name `Vasileios`, PubMed/Zotero authoritative `Victoria` — PubMed efetch wins). - verify_crossref(), verify_pubmed_pmid() now return full family-name list (not first only). - verify_record() rewritten — PMID→efetch→CrossRef fallback, every cited author compared against actual_authors[i], total counts compared, MISMATCH status raised on any mismatch; note classification distinguishes first-author hallucination vs non-first-author/count mismatch. - _normalize_surname() rewritten with unicodedata.NFKD decomposition + multi-char fallback table (ł đ ı ø æ œ ß). Eliminates Paper 1 gunes2025textual `Çolakoğlu` vs PubMed `Colakoglu` false-positive. - `_audit_truncated = true` BibTeX field — opt-in marker for intentional CSL truncation (e.g., Nature first-1+et-al rendering with only 6 of 57 authors in bib). Suppresses count-mismatch MISMATCH while still reporting the gap as a NOTE in evidence. verify-refs/SKILL.md, skill.yml: version 1.2.0 + changelog section. lit-sync/SKILL.md: post-BBT-export regression audit hook — run verify-refs v1.2.0 on the refreshed .bib and block on MISMATCH so AI/BBT-introduced hallucinations are caught before downstream consumers read the file. search-lit/SKILL.md: anti-hallucination protocol updated — compare full author list, not first+last only. Authoritative source priority documented: PubMed efetch > esummary > CrossRef (CrossRef given names not authoritative). manage-refs/scripts/render_pandoc.sh: pre-render verify-refs v1.2.0 audit gate (-S flag to bypass). Blocks pandoc render on MISMATCH so a missed hallucination cannot ship into a journal-formatted docx. Tests: tests/test_phase1c_hooks.sh 12/12 pass post-change. Paper 1 paper1.bib 69-entry audit: 58 OK + 11 UNVERIFIED (arXiv, known FP) + 0 MISMATCH; submission_safe=true. Refs: - ~/.claude/rules/citation-safety.md v1.1.4 - Memory: paper1_npj_dm_r1_v2_circulation_dispatched_2026-05-11 - Memory: feedback_paper1_bib_stale_post_bbt_2026-05-11 - Memory: feedback_codex_review_design_hallucination_2026-05-11 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Splits commit
0679c3eout of PR #10 (feat/academic-aio-phase-lift-dedupe), where it was riding under an unrelated academic-aio schema change. This is the verify-refs v1.2.0 work tracked by issue #17.liu2026benchmarkingfailure mode where 7/10 given names were hallucinated but family names + first-author matched.efetch.fcgi(XML) prioritized over CrossRef API for author retrieval — CrossRef has documented given-name drift (Aydin 2024 Vlachos: CrossRef "Vasileios" vs PubMed "Victoria", PubMed correct).verify_refs.py+338/−68; toucheslit-sync,search-lit,verify-refsSKILL.md +render_pandoc.sh+skill.yml.Why draft
Issue #17 requires test coverage that this commit does not yet include:
liu2026benchmarking7/10 case)skills/manage-refs/tests/fixtures/pre_submission_gate/) — extend with a known-mismatch entryMarking draft until the above lands. The implementation itself is complete and was cherry-picked verbatim (
0679c3e→e322236, clean — no conflicts against currentmain).Provenance
The original session that authored
0679c3eis closed. This PR was reconstructed by cherry-picking the commit onto a clean branch frommain(post-#13). PR #10 will be rebased to drop0679c3eso it carries only the academic-aio schema change.Closes #17 once the test checklist above is complete.
🤖 Generated with Claude Code