Skip to content

Commit 82089d8

Browse files
feat: prepare release 0.8.0 with comprehensive changelog (#997)
Major features in this release: - Weekly dead code detection with automated GitHub reports - Parallel BibTeX processing (2-4x performance improvement) - Protocol-based architecture overhaul (removed ~350 lines) - Custom list persistence with database storage - Cross-validation generalization to single-backend scenarios - High-level architecture documentation Breaking changes: - Algerian Ministry: RAR → ZIP migration - Custom list commands: add-list → custom-list subcommands - Backend architecture: removed UpdateSourceRegistry/DataUpdater This release represents 378 commits of comprehensive improvements since v0.7.0, focusing on code quality, performance, and architectural enhancements. Co-authored-by: florath-ai-assistant[bot] <Andreas.Florath@telekom.de>
1 parent a1eeb19 commit 82089d8

File tree

5 files changed

+149
-7
lines changed

5 files changed

+149
-7
lines changed

CITATION.cff

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@ authors:
1010
orcid: https://orcid.org/0009-0001-6471-7372
1111
repository-code: "https://github.com/sustainet-guardian/aletheia-probe"
1212
license: MIT
13-
version: 0.7.0
14-
date-released: 2025-12-11
13+
version: 0.8.0
14+
date-released: 2026-01-08
1515
keywords:
1616
- predatory-journals
1717
- academic-integrity

docs/CHANGELOG.md

Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,148 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
## [0.8.0] - 2026-01-08
11+
12+
### Added
13+
14+
#### 🚀 Major Features
15+
16+
- **Dead Code Detection & Quality Improvements**: Implemented weekly automated dead code detection with proper exit codes (#996)
17+
- GitHub Action integration creates detailed dead code reports every Saturday
18+
- Revealed and fixed numerous code quality issues through automated detection
19+
- Enhanced @code_is_used decorator system for flexible dead code management
20+
- **Parallel BibTeX Processing**: Implemented parallel execution for BibTeX files (#792)
21+
- 2-4x performance improvement for large BibTeX file processing
22+
- ThreadPoolExecutor-based architecture with configurable workers (default: 12)
23+
- Smart thresholding with race condition prevention using WAL-mode SQLite
24+
- **Protocol-Based Architecture Overhaul**: Complete architectural refactoring implementing protocol-based backend design (#859)
25+
- Eliminated dual-registry pattern (UpdateSourceRegistry + DataUpdater singletons)
26+
- Reduced code complexity significantly (removed ~350 lines of redundant code)
27+
- Introduced DataSyncCapable protocol for flexible backend sync capability
28+
- Enhanced maintainability and testability with single source of truth: BackendRegistry
29+
- **Cross-Validation Generalization**: Extended cross-validation support to single-backend scenarios (#944)
30+
- Moved cross-validation logic to generic orchestration framework for broader applicability
31+
- Improved assessment accuracy across all backend types
32+
- Enhanced publisher validation consistency
33+
- **High-Level Architecture Documentation**: Added comprehensive `ARCHITECTURE_OVERVIEW.md` document (#950)
34+
- Documents core system components, data flow patterns, and design decisions
35+
- Covers backend categorization and performance characteristics
36+
- Provides essential onboarding resource for new developers
37+
- **Configuration Parameter Enhancement**: Added `--config` parameter support for flexible configuration management
38+
- Enhanced command-line interface for better user experience
39+
- Improved configuration flexibility and deployment options
40+
41+
#### 🔄 Major Refactorings
42+
43+
- **Custom List Complete Refactoring**: **BREAKING CHANGE** - Replaced 'add-list' with 'custom-list' command group (#978)
44+
- Persistent custom list management with database storage
45+
- Auto-registration system maintains lists between sessions
46+
- New subcommands: `add`, `list`, `remove` for comprehensive management
47+
- Fixed critical design flaw where registrations were lost between sessions
48+
49+
### Changed
50+
51+
#### ⚠️ Breaking Changes
52+
53+
- **Algerian Ministry: RAR → ZIP Migration**: **BREAKING CHANGE** - Transitioned from RAR to ZIP archive format (#931)
54+
- Improved compatibility and reduced external dependencies
55+
- Enhanced reliability of data processing pipeline
56+
- RAR archive support completely removed
57+
- **Backend Architecture Changes**: **BREAKING CHANGE** - Removed UpdateSourceRegistry and DataUpdater classes
58+
- New DataSyncCapable protocol implementation required for custom backends
59+
- Eliminated singleton patterns and global state
60+
61+
#### 🛠️ Improvements & Optimizations
62+
63+
- **Backend Standardization**: Systematic 4-phase backend pattern standardization
64+
- Standardized backend error handling patterns across Phase 2 (#986)
65+
- Migrated backends to use confidence_utils Phase 3 (#987)
66+
- Standardized backend patterns Phase 4 (#988)
67+
- Enhanced error handling consistency across all backends
68+
- **Performance Optimizations**:
69+
- Optimized ClientSession usage in Crossref backend (#891)
70+
- Reduced concurrent backends limit from 999 to 15 for enhanced stability
71+
- Fixed SQLite database lock conflicts during concurrent sync operations
72+
- Enhanced caching strategies and timeout configurations
73+
- **Automatic Fallback Chain Infrastructure** (#989, #990, #991):
74+
- Crossref backend: 60+ lines → 15 lines
75+
- OpenAlex backend: 70+ lines → 25 lines
76+
- RetractionWatch backend: 45+ lines → 20 lines
77+
- Added async/sync method detection and enhanced exception handling
78+
79+
### Fixed
80+
81+
#### 🐛 Bug Fixes & Code Quality
82+
83+
- **Deprecation Warning Resolution**: Fixed Python compatibility deprecation warnings (#763)
84+
- Replaced deprecated `datetime.utcnow()` with timezone-aware `datetime.now(UTC)` (#761)
85+
- Replaced deprecated SQLite3 datetime adapter with explicit ISO format conversion (#768)
86+
- Enhanced future Python version compatibility
87+
- **Database & Performance Issues**:
88+
- Proper closure of SQLite database connections (#758)
89+
- Database lock conflict resolution during concurrent operations
90+
- Enhanced cache schema with improved retraction statistics handling
91+
- Added input validation to KeyValueCache methods
92+
- **Code Quality Enhancements** (118 refactoring commits):
93+
- Eliminated magic strings/numbers with named constants across multiple modules
94+
- Replaced magic strings in AlgerianMinistrySource (#966)
95+
- Replaced magic numbers with EntryType enums in PDF parser (#961)
96+
- Replaced magic numbers with RiskLevel enum in retraction_watch.py (#846)
97+
- Enhanced type safety and documentation coverage
98+
- **Backend Error Handling**:
99+
- Removed broad exception handling in multiple sources (#982, #946, #912)
100+
- Improved exception specificity across CustomListSource, BeallsListSource, and PredatoryJournalsSource
101+
- Standardized error handling patterns and improved logging
102+
103+
#### 🎯 Assessment & Query Logic Improvements
104+
105+
- **Cross-Validation & Publisher Logic**:
106+
- Enabled cross-validation for single-backend scenarios (#944)
107+
- Moved cross-validation logic to generic orchestration framework (#939)
108+
- Improved publisher validation consistency (#938)
109+
- Closed journal size classification gap for 50-99 DOI journals (#889)
110+
- **Data Source Improvements**:
111+
- Replaced blocking file I/O with async operations in Algerian downloader (#949)
112+
- Improved Kscien UI elements parsing accuracy (#774)
113+
- Improved sync logging and skip reasons for backends (#985)
114+
- Force config reload after backend registration in add-list command (#972)
115+
116+
### Documentation & Testing
117+
118+
#### 📝 Documentation Improvements (47 commits)
119+
120+
- **Comprehensive Docstring Updates**:
121+
- Added Google-style docstrings to example functions (#958)
122+
- Added comprehensive docstrings for DOAJ backend complex methods (#867)
123+
- Enhanced CLI module (#898), Cross validator module (#929), and Kscien helpers (#930)
124+
- Enhanced updater utility functions (#910) and BeallsListSource (#942)
125+
- **Architecture & Standards**:
126+
- Enhanced updater module documentation (#905) and updater/core.py (#914)
127+
- Added testing scope standard to CODING_STANDARDS.md (#965)
128+
- Documented regex pattern in JournalNameCleaner (#922)
129+
- Updated database schema documentation
130+
131+
#### 🧪 Testing Infrastructure Expansion (65 commits)
132+
133+
- **Comprehensive Test Coverage Additions**:
134+
- CustomListSource class (#980)
135+
- KscienHijackedJournalsSource (#960) and KscienStandaloneJournalsSource (#956)
136+
- PDF parser (#953) and ArchiveExtractor (#962)
137+
- Crossref assessment logic (#894) and KscienPublishersBackend (#843)
138+
- AlgerianMinistryBackend (#839) and PredatoryJournals backend (#831)
139+
- **Test Quality Improvements**:
140+
- Removed tests for private methods to focus on public APIs
141+
- Improved test assertion quality with exact value checks
142+
- Enhanced test assertion quality across multiple test suites
143+
144+
### Performance Metrics
145+
146+
- **2-4x speedup** for large BibTeX file processing through parallel execution
147+
- **~350 lines** of redundant code eliminated through architectural improvements
148+
- **378 total commits** with comprehensive quality improvements since v0.7.0
149+
- **Enhanced stability** through reduced concurrent backend limits and improved error handling
150+
- **Significant complexity reduction** in core backend implementations
151+
10152
## [0.7.0] - 2025-12-11
11153

12154
### Added

docs/JOSS/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@ Once all requirements are met:
9292
1. Go to https://joss.theoj.org/papers/new
9393
2. Fill in the submission form:
9494
- **Repository URL**: https://github.com/sustainet-guardian/aletheia-probe
95-
- **Version**: v0.7.0 (or latest release)
95+
- **Version**: v0.8.0 (or latest release)
9696
- **Archive DOI**: Your Zenodo DOI
9797
- **Editor suggestions**: (optional) Editors with expertise in research software, scientific publishing, or bibliometrics
9898
3. Submit the form

docs/research-applications.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -174,7 +174,7 @@ In flowchart:
174174
> "Records excluded: Predatory journal (n=10)"
175175
176176
In methods:
177-
> "Source quality validation: All references were validated using Aletheia-Probe v0.7.0 to identify potential predatory journals. The tool cross-references journals against DOAJ, Beall's List historical archives, Kscien databases, and publication pattern analysis from OpenAlex and Crossref. Journals flagged as predatory with confidence ≥0.75 were excluded (n=7). Journals with confidence 0.50-0.75 underwent independent review by two authors (n=12), resulting in 3 additional exclusions. This process ensures the review dataset comprises publications from legitimate scholarly venues."
177+
> "Source quality validation: All references were validated using Aletheia-Probe v0.8.0 to identify potential predatory journals. The tool cross-references journals against DOAJ, Beall's List historical archives, Kscien databases, and publication pattern analysis from OpenAlex and Crossref. Journals flagged as predatory with confidence ≥0.75 were excluded (n=7). Journals with confidence 0.50-0.75 underwent independent review by two authors (n=12), resulting in 3 additional exclusions. This process ensures the review dataset comprises publications from legitimate scholarly venues."
178178
179179
### Scientometric Studies
180180

@@ -323,7 +323,7 @@ research-project/
323323

324324
Include in methods or supplementary materials:
325325

326-
1. **Tool version**: Aletheia-Probe version (e.g., v0.7.0)
326+
1. **Tool version**: Aletheia-Probe version (e.g., v0.8.0)
327327
2. **Assessment date**: When assessments were performed
328328
3. **Data sync date**: When local caches were last synchronized
329329
4. **Enabled backends**: Which data sources were queried
@@ -332,7 +332,7 @@ Include in methods or supplementary materials:
332332

333333
**Example methodology text:**
334334

335-
> "Journal classification employed Aletheia-Probe v0.7.0 (DOI: 10.5281/zenodo.17788487) executed on 2025-12-15. Local data sources (DOAJ, Beall's List, Kscien databases) were synchronized on 2025-12-10. External APIs (OpenAlex, Crossref) were queried on 2025-12-15, with all responses cached and preserved in supplementary data files. The complete assessment output, including all backend responses and confidence scores, is archived in the supplementary materials (validation-results.json). The configuration file specifying enabled backends and thresholds is also provided for full methodological transparency."
335+
> "Journal classification employed Aletheia-Probe v0.8.0 (DOI: 10.5281/zenodo.17788487) executed on 2025-12-15. Local data sources (DOAJ, Beall's List, Kscien databases) were synchronized on 2025-12-10. External APIs (OpenAlex, Crossref) were queried on 2025-12-15, with all responses cached and preserved in supplementary data files. The complete assessment output, including all backend responses and confidence scores, is archived in the supplementary materials (validation-results.json). The configuration file specifying enabled backends and thresholds is also provided for full methodological transparency."
336336
337337
**Trade-offs**
338338

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
44

55
[project]
66
name = "aletheia-probe"
7-
version = "0.7.0"
7+
version = "0.8.0"
88
description = "Automated tool for assessing predatory journals using multiple backend sources"
99
readme = "README.md"
1010
license = {file = "LICENSE"}

0 commit comments

Comments
 (0)