Commit 63ae163
authored
feat(search-lit): add East Asian name + CollectiveName heuristics to parse_pubmed (#35)
Adds two anti-hallucination heuristics to
skills/search-lit/references/parse_pubmed.py:
1. East Asian name reverse encoding (LastName / ForeName swapped)
PubMed XML occasionally encodes East Asian author names with the
given name in <LastName> and the family name in <ForeName>. Naive
parsers then emit BibTeX entries with the wrong first-author surname,
which downstream /verify-refs first-author cross-check flags as a
mismatch. The parser now detects this pattern (LastName ≥3 alpha
chars + ForeName 1-2 alpha chars with no period) and prints a
"% [VERIFY] East Asian name order suspected" comment above the
BibTeX entry plus an inline ⚠ note in efetch markdown output. The
author order is preserved verbatim — the script never silently
swaps fields it isn't certain about.
2. CollectiveName (corporate / consortium guideline) handling
AuthorList elements may contain <CollectiveName> instead of
LastName / ForeName (KDIGO, AHA/ACC, WHO guideline patterns).
Previously these authors were silently dropped, leaving the BibTeX
entry with an empty author field. The parser now:
- Emits the corporate name as {{Group Name}} (double-brace) so
BibTeX styles do not try to split on commas/spaces.
- Switches the BibTeX entry type from @Article to @misc when the
AuthorList contains only CollectiveName entries (matches the
/manuscript-references corporate-author convention).
- Includes the corporate name in the cite-key surname slot.
Both heuristics share an _extract_authors() helper that returns
bib_authors, display_authors, first_author_last, suspicions, and a
has_collective_only flag, used by both parse_efetch and generate_bibtex.
Smoke-tested against synthetic XML with the Fu 2024 reverse-encoded
case and a KDIGO Working Group CollectiveName case. Both produce
correct output with the appropriate ⚠ / % [VERIFY] notes.1 parent 2fb4ce6 commit 63ae163
1 file changed
Lines changed: 102 additions & 19 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| 21 | + | |
21 | 22 | | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
26 | 111 | | |
27 | 112 | | |
28 | 113 | | |
| |||
103 | 188 | | |
104 | 189 | | |
105 | 190 | | |
106 | | - | |
| 191 | + | |
107 | 192 | | |
108 | | - | |
109 | | - | |
110 | | - | |
111 | | - | |
112 | | - | |
113 | | - | |
114 | | - | |
| 193 | + | |
115 | 194 | | |
116 | 195 | | |
117 | 196 | | |
| |||
137 | 216 | | |
138 | 217 | | |
139 | 218 | | |
| 219 | + | |
| 220 | + | |
140 | 221 | | |
141 | 222 | | |
142 | 223 | | |
| |||
177 | 258 | | |
178 | 259 | | |
179 | 260 | | |
180 | | - | |
181 | | - | |
182 | | - | |
183 | | - | |
184 | | - | |
185 | | - | |
186 | | - | |
187 | | - | |
188 | | - | |
189 | | - | |
| 261 | + | |
| 262 | + | |
190 | 263 | | |
191 | 264 | | |
192 | 265 | | |
193 | 266 | | |
194 | | - | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
195 | 278 | | |
196 | 279 | | |
197 | 280 | | |
| |||
0 commit comments