Commit 6827f2a
* feat: Handle arXiv preprints separately from journal/conference assessment (fixes #72)
This change prevents arXiv preprints from being assessed as journals or
conferences, which was causing inflated 'unknown' statistics and wasting
backend query resources on non-venues.
Changes:
- Add _is_arxiv_entry() method to detect arXiv patterns in BibTeX entries
- Skip arXiv entries during parsing and track them separately
- Update BibtexAssessmentResult model with arxiv_entries_count and
skipped_entries_count fields
- Modified parse_bibtex_file() to return tuple with separate counts
- Enhanced summary output to show arXiv entries separately
- Add comprehensive test coverage for arXiv detection
Detection patterns:
- "arXiv preprint arXiv:XXXX.XXXXX"
- "ArXiv e-prints"
- Bare arXiv identifiers "arXiv:XXXX.XXXXX"
- Old-style arXiv IDs "arXiv:cs.AI/9901001"
- Misc entries containing "arxiv"
Fixes inflated 'unknown' assessment statistics by properly categorizing
arXiv preprints as a separate class of entries rather than attempting
to assess them as publication venues.
* refactor: Improve user-facing logging for arXiv preprint filtering
Enhanced the logging output to make it clearer when arXiv preprints
are being skipped during BibTeX parsing.
Changes:
- Split logging into separate, clearer messages
- Use status_logger.info() for arXiv skipping (visible to users)
- Clarify that arXiv entries are "not publication venues"
- Remove confusing "problematic entries" wording
- Keep technical parsing details in detail_logger.debug()
Before:
"Successfully parsed 11 entries [...] skipped 80 problematic entries
(59 arXiv, 21 other)"
After:
"Skipped 59 arXiv preprint(s) - not publication venues"
This addresses user confusion about why entries with journal fields
containing "arXiv preprint arXiv:..." are being excluded from assessment.
---------
Co-authored-by: florath-ai-assistant[bot] <Andreas.Florath@telekom.de>
1 parent 6e41a93 commit 6827f2a
File tree
4 files changed
+184
-33
lines changed- src/aletheia_probe
- tests/unit
4 files changed
+184
-33
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
75 | 75 | | |
76 | 76 | | |
77 | 77 | | |
78 | | - | |
79 | | - | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
80 | 84 | | |
81 | 85 | | |
82 | 86 | | |
83 | 87 | | |
| 88 | + | |
84 | 89 | | |
85 | 90 | | |
86 | 91 | | |
87 | 92 | | |
88 | 93 | | |
89 | 94 | | |
90 | 95 | | |
91 | | - | |
| 96 | + | |
92 | 97 | | |
| 98 | + | |
| 99 | + | |
93 | 100 | | |
94 | 101 | | |
95 | 102 | | |
| |||
260 | 267 | | |
261 | 268 | | |
262 | 269 | | |
263 | | - | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
264 | 280 | | |
265 | 281 | | |
266 | 282 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
29 | | - | |
| 29 | + | |
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
| |||
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
41 | | - | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
42 | 45 | | |
43 | 46 | | |
44 | 47 | | |
| |||
89 | 92 | | |
90 | 93 | | |
91 | 94 | | |
| 95 | + | |
92 | 96 | | |
93 | 97 | | |
94 | 98 | | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
95 | 107 | | |
96 | 108 | | |
97 | 109 | | |
| |||
107 | 119 | | |
108 | 120 | | |
109 | 121 | | |
110 | | - | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
111 | 130 | | |
112 | | - | |
113 | | - | |
| 131 | + | |
114 | 132 | | |
115 | | - | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
116 | 136 | | |
117 | | - | |
118 | | - | |
| 137 | + | |
119 | 138 | | |
120 | 139 | | |
121 | | - | |
| 140 | + | |
122 | 141 | | |
123 | 142 | | |
124 | 143 | | |
| |||
187 | 206 | | |
188 | 207 | | |
189 | 208 | | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
190 | 213 | | |
191 | 214 | | |
192 | 215 | | |
| |||
472 | 495 | | |
473 | 496 | | |
474 | 497 | | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
180 | 180 | | |
181 | 181 | | |
182 | 182 | | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
183 | 189 | | |
184 | 190 | | |
185 | 191 | | |
| |||
0 commit comments