You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: Add comprehensive documentation for retraction checking and acronym expansion features [AI-assisted] (#212)
Document two significant features that were implemented but undocumented:
1. Article retraction checking system:
- Individual article retraction detection by DOI
- Journal-level retraction statistics and risk levels
- Three-tier data sources (local, Crossref API, caching)
- Integration with BibTeX assessment and output formatting
- Risk classification with rate-based and count-based thresholds
2. Conference acronym expansion system:
- Automatic expansion during assessment (ICSE → International Conference...)
- Intelligent acronym detection and database lookup
- Auto-learning from parenthetical text
- Integration with existing CLI management commands
Resolves issue #208 by providing conceptual documentation, usage examples,
and integration details for these advanced features. Users can now understand
and effectively utilize retraction checking and acronym expansion capabilities.
Updated Table of Contents and Key Features section to reflect new content.
All quality checks pass.
Co-authored-by: florath-ai-assistant[bot] <Andreas.Florath@telekom.de>
@@ -24,6 +25,8 @@ The Journal Assessment Tool helps researchers and institutions evaluate whether
24
25
25
26
-**Multi-source verification**: Combines DOAJ, Beall's List, Retraction Watch, and institutional data
26
27
-**BibTeX batch processing**: Assess entire bibliographies from BibTeX files with automated exit codes
28
+
-**Article retraction checking**: Identifies retracted publications by DOI with detailed statistics
29
+
-**Conference acronym expansion**: Automatically expands abbreviations like "ICSE" to full conference names
27
30
-**Intelligent matching**: Handles name variations, ISSNs, and publisher information
28
31
-**Confidence scoring**: Provides probabilistic assessments with clear reasoning
29
32
-**Fast performance**: Local caching reduces API calls and improves speed
@@ -552,18 +555,47 @@ backends:
552
555
553
556
## Conference Acronym Management
554
557
555
-
The conference acronym management feature helps expand conference abbreviations to their full names. This is particularly useful when processing bibliographic data where conferences may be referenced by common acronyms like "ICSE" (International Conference on Software Engineering) or "NIPS" (Neural Information Processing Systems).
558
+
The conference acronym expansion system automatically expands conference abbreviations to their full names during assessment. This improves matching accuracy when processing bibliographic data where conferences may be referenced by common acronyms like "ICSE" (International Conference on Software Engineering) or "NIPS" (Neural Information Processing Systems).
556
559
557
-
### Concept
560
+
### How Acronym Expansion Works
558
561
559
-
Conference acronyms are stored in a local database that maps short forms to their full conference names. The system automatically builds this mapping as it encounters conference data during journal assessments, and also allows manual management of acronym mappings.
562
+
When you query a journal or conference name, the system automatically:
560
563
561
-
### Purpose
564
+
1. **Detects acronyms**: Identifies when input appears to be an acronym (short, mostly uppercase)
565
+
2. **Database lookup**: Searches the local acronym database for known expansions
566
+
3. **Expands for matching**: Uses the full conference name for better backend matching
567
+
4. **Preserves original**: Tracks the original acronym in `acronym_expanded_from` field
568
+
5. **Auto-learns**: Extracts new acronym mappings from text like "International Conference on Software Engineering (ICSE)"
562
569
563
-
- **Standardization**: Ensure consistent conference naming across bibliographic data
564
-
- **Expansion**: Convert acronyms to full names for better readability and processing
565
-
- **Data Quality**: Maintain a curated database of conference name mappings
566
-
- **Automation**: Reduce manual effort in processing conference references
570
+
**Example workflow:**
571
+
```bash
572
+
# User queries an acronym
573
+
aletheia-probe conference "ICSE"
574
+
575
+
# System automatically expands to "International Conference on Software Engineering"
576
+
# Backend searches use the full name for better results
577
+
# Output shows: acronym_expanded_from: "ICSE"
578
+
```
579
+
580
+
### Automatic vs Manual Acronym Management
581
+
582
+
**Automatic expansion** happens during assessment:
583
+
- Input normalization checks if query looks like an acronym
584
+
- Local database provides expansions for better matching
585
+
- Parenthetical text like "(ICML)" automatically creates new mappings
586
+
587
+
**Manual management** allows database control:
588
+
- Pre-populate common acronyms for your field
589
+
- Correct automatic mappings
590
+
- Add institution-specific abbreviations
591
+
- View and clear stored mappings
592
+
593
+
### Benefits
594
+
595
+
- **Better Matching**: Full names improve backend search accuracy
596
+
- **Standardization**: Consistent conference naming across bibliographic data
597
+
- **Automation**: Reduces manual effort in processing conference references
598
+
- **Transparency**: Original acronyms preserved in assessment results
567
599
568
600
### Available Commands
569
601
@@ -672,6 +704,153 @@ The acronym database integrates automatically with journal assessment workflows.
672
704
673
705
For implementation details, see `src/aletheia_probe/cli.py`.
674
706
707
+
## Article Retraction Checking
708
+
709
+
The article retraction checking system identifies retracted academic publications by their DOI and integrates retraction data into assessment results. This helps users identify potentially problematic articles in their bibliographies and provides journal-level retraction statistics as quality indicators.
710
+
711
+
### What Retraction Checking Does
712
+
713
+
The system performs two levels of retraction analysis:
714
+
715
+
1. **Article-level checking**: Identifies individual retracted articles by DOI
716
+
2. **Journal-level statistics**: Calculates retraction rates and risk levels for journals
717
+
718
+
When processing BibTeX files, any articles with DOIs are automatically checked against retraction databases. Retracted articles are flagged with clear warnings and detailed retraction information.
719
+
720
+
### How It Works
721
+
722
+
#### Data Sources (Three-Tier Lookup)
723
+
724
+
The system checks multiple sources in priority order:
725
+
726
+
1. **Local Retraction Watch Database** (fastest)
727
+
- Pre-populated from Retraction Watch CSV dataset during sync
728
+
- Contains ~50,000+ retracted articles indexed by DOI
729
+
- Checked first for immediate results
730
+
731
+
2. **Crossref API** (real-time)
732
+
- Queries Crossref metadata for retraction notices
733
+
- Looks for 'update-by' and 'update-to' fields
734
+
- Used as fallback when local database doesn't have the article
735
+
736
+
3. **Intelligent Caching**
737
+
- 30-day cache for all retraction checks
738
+
- Negative results (non-retracted) also cached
739
+
- Prevents redundant API calls
740
+
741
+
#### Retraction Detection Process
742
+
743
+
```bash
744
+
# For each article with a DOI:
745
+
1. Check cache (30-day TTL) → Return if found
746
+
2. Check local database → Cache result if retracted
747
+
3. Query Crossref API → Cache result if retracted
748
+
4. Cache negative result → Mark as not retracted
749
+
```
750
+
751
+
### Retraction Information Provided
752
+
753
+
For retracted articles, the system provides:
754
+
755
+
- **Retraction status**: Whether the article has been retracted
756
+
- **Retraction type**: Misconduct, error, plagiarism, etc.
757
+
- **Retraction date**: When the retraction was issued
758
+
- **Retraction DOI**: DOI of the retraction notice (if available)
0 commit comments