Commit d74770e
fix: resolve kscien pagination bug - add enum.value and fix pagination detection (#1012)
Critical bug fix for kscien data sync that was only retrieving 1-2 entries per source
instead of hundreds/thousands. Root causes identified and fixed:
1. Enum serialization bug in URL construction
- kscien_generic.py:55 - Added .value to publication_type enum in base URL
- kscien_helpers.py:68 - Added .value to publication_type enum in pagination URL
- Without .value, URLs contained "PublicationType.PUBLISHERS" instead of "publishers"
2. Pagination detection logic incorrectly relied on HTML links
- kscien_helpers.py:345-389 - Rewrote _has_next_page() function
- Old logic: Looked for sequential pagination links (page 2→3→4) which don't exist
- New logic: Continue based on expected count and empty page detection
- kscien.org only shows links to current page and last page, not sequential pages
Impact:
- Before: 1-2 entries per source (~99% data loss)
- After: 1,251 publishers, 1,456 journals, 449 conferences, 184 hijacked journals
- Improvement: 1000x+ increase in data retrieval
Tests:
- All existing unit tests pass
- Manual verification: curl confirmed pages 1, 3, 10 contain different valid data
- Real sync test: Retrieved expected counts matching kscien.org website
Files changed:
- src/aletheia_probe/updater/sources/kscien_generic.py (1 line)
- src/aletheia_probe/updater/sources/kscien_helpers.py (pagination logic rewrite)
- tests/unit/test_kscien_refactor.py (2 lines - add .value in test URLs)
Co-authored-by: florath-ai-assistant[bot] <Andreas.Florath@telekom.de>1 parent 5f54197 commit d74770e
File tree
3 files changed
+12
-21
lines changed- src/aletheia_probe/updater/sources
- tests/unit
3 files changed
+12
-21
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
52 | 52 | | |
53 | 53 | | |
54 | 54 | | |
55 | | - | |
| 55 | + | |
56 | 56 | | |
57 | 57 | | |
58 | 58 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
65 | 65 | | |
66 | 66 | | |
67 | 67 | | |
68 | | - | |
| 68 | + | |
69 | 69 | | |
70 | 70 | | |
71 | 71 | | |
| |||
369 | 369 | | |
370 | 370 | | |
371 | 371 | | |
372 | | - | |
373 | | - | |
374 | | - | |
375 | | - | |
376 | | - | |
377 | | - | |
378 | | - | |
379 | | - | |
| 372 | + | |
380 | 373 | | |
381 | 374 | | |
382 | | - | |
383 | | - | |
384 | | - | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
385 | 381 | | |
386 | 382 | | |
387 | 383 | | |
388 | | - | |
389 | | - | |
| 384 | + | |
390 | 385 | | |
391 | 386 | | |
392 | 387 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | | - | |
31 | | - | |
32 | | - | |
| 30 | + | |
33 | 31 | | |
34 | 32 | | |
35 | 33 | | |
| |||
69 | 67 | | |
70 | 68 | | |
71 | 69 | | |
72 | | - | |
73 | | - | |
74 | | - | |
| 70 | + | |
75 | 71 | | |
76 | 72 | | |
77 | 73 | | |
| |||
0 commit comments