You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: add UGC-CARE discontinued list backends (#1019)
* feat: add UGC-CARE discontinued list backends [AI-assisted]
Add dedicated sync-backed sources and cached backends for UGC-CARE cloned Group I, cloned Group II, and delisted Group II lists.
Update backend/source registration, config URL defaults, output formatter list presence, and docs (README + API/config references) so the new backends are discoverable and usable.
Include unit tests for source parsing, registration, and backend metadata to reduce regression risk for future list-format changes.
* feat: add included-side UGC clone backends [AI-assisted]
Add legitimate-list backends for left-side included journals from UGC clone correction pages (Group I and Group II) to prevent ISSN-only false impressions.
Fix clone parser side mapping so cloned records use cloned-side ISSN/eISSN instead of original-side identifiers.
Extend tests and documentation to cover both included and cloned sides of UGC clone pages.
* fix: satisfy mypy title typing in UGC parser [AI-assisted]
Guard clone-side title extraction with explicit type narrowing before calling _build_entry so mypy can prove non-optional str arguments.
This preserves runtime behavior while eliminating strict type-check failures in ugc_care source parsing.
---------
Co-authored-by: florath-ai-assistant[bot] <Andreas.Florath@telekom.de>
**Output**: Combines data from multiple authoritative sources and advanced pattern analysis to provide confidence-scored assessments of journal legitimacy.
65
65
66
-
**Note**: The first sync downloads and processes data from multiple sources (DOAJ, Beall's List, etc.), which takes a few minutes. After that, queries typically complete in under 5 seconds.
66
+
**Note**: The first sync downloads and processes data from multiple sources (DOAJ, Beall's List, UGC-CARE discontinued lists, etc.), which takes a few minutes. After that, queries typically complete in under 5 seconds.
67
67
68
68
## Data Sources
69
69
70
70
This tool acts as a **data aggregator** - it doesn't provide data itself, but combines information from multiple authoritative sources:
Copy file name to clipboardExpand all lines: docs/api-reference/backends.md
+5Lines changed: 5 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -103,6 +103,11 @@ YAML structure:
103
103
| Backend | Purpose | Source |
104
104
|---------|---------|--------|
105
105
|**bealls**| Beall's List archive |`src/aletheia_probe/backends/bealls.py`|
106
+
|**ugc_care_cloned**| UGC-CARE Group I cloned journals |`src/aletheia_probe/backends/ugc_care_cloned.py`|
107
+
|**ugc_care_cloned_group2**| UGC-CARE Group II cloned journals |`src/aletheia_probe/backends/ugc_care_cloned_group2.py`|
108
+
|**ugc_care_delisted_group2**| UGC-CARE Group II delisted journals |`src/aletheia_probe/backends/ugc_care_delisted_group2.py`|
109
+
|**ugc_care_included_from_clone_group1**| UGC-CARE Group I included journals from clone page |`src/aletheia_probe/backends/ugc_care_included_from_clone_group1.py`|
110
+
|**ugc_care_included_from_clone_group2**| UGC-CARE Group II included journals from clone page |`src/aletheia_probe/backends/ugc_care_included_from_clone_group2.py`|
Copy file name to clipboardExpand all lines: docs/configuration.md
+58-1Lines changed: 58 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -204,6 +204,63 @@ backends:
204
204
205
205
See `src/aletheia_probe/backends/predatoryjournals.py`
206
206
207
+
### UGC-CARE Discontinued Lists Backends
208
+
209
+
```yaml
210
+
backends:
211
+
ugc_care_cloned:
212
+
enabled: true
213
+
weight: 0.9
214
+
timeout: 5
215
+
config:
216
+
cache_ttl_hours: 720 # 30 days - discontinued static list
217
+
218
+
ugc_care_cloned_group2:
219
+
enabled: true
220
+
weight: 0.9
221
+
timeout: 5
222
+
config:
223
+
cache_ttl_hours: 720 # 30 days - discontinued static list
224
+
225
+
ugc_care_delisted_group2:
226
+
enabled: true
227
+
weight: 0.9
228
+
timeout: 5
229
+
config:
230
+
cache_ttl_hours: 720 # 30 days - discontinued static list
231
+
232
+
ugc_care_included_from_clone_group1:
233
+
enabled: true
234
+
weight: 1.0
235
+
timeout: 5
236
+
config:
237
+
cache_ttl_hours: 720 # 30 days - discontinued static list
238
+
239
+
ugc_care_included_from_clone_group2:
240
+
enabled: true
241
+
weight: 1.0
242
+
timeout: 5
243
+
config:
244
+
cache_ttl_hours: 720 # 30 days - discontinued static list
245
+
```
246
+
247
+
**Configuration**:
248
+
- `cache_ttl_hours`: How long cached UGC-CARE list data remains valid before requiring re-sync. Default is 720 hours (30 days), appropriate for discontinued/frozen sources.
249
+
250
+
**Backend Descriptions**:
251
+
- `ugc_care_cloned`: UGC-CARE Group I cloned journals list
252
+
- `ugc_care_cloned_group2`: UGC-CARE Group II cloned journals list
253
+
- `ugc_care_delisted_group2`: UGC-CARE Group II delisted journals list
254
+
- `ugc_care_included_from_clone_group1`: UGC-CARE Group I included journals from clone correction page (left side)
255
+
- `ugc_care_included_from_clone_group2`: UGC-CARE Group II included journals from clone correction page (left side)
The Kscien suite provides curated lists of predatory journals, publishers, hijacked journals, and conferences. All Kscien backends share the same configuration pattern.
0 commit comments