|
| 1 | +# Future Integrations and Data Sources |
| 2 | + |
| 3 | +This directory contains assessments and evaluations of potential data sources and integrations that have been considered for aletheia-probe but are currently deferred or not implemented. |
| 4 | + |
| 5 | +## Purpose |
| 6 | + |
| 7 | +Each file in this directory documents: |
| 8 | +- **What** the data source or integration is |
| 9 | +- **Why** it was considered |
| 10 | +- **Pros and cons** of integration |
| 11 | +- **Technical feasibility** and effort estimates |
| 12 | +- **Recommendation** (defer, low priority, or conditions for implementation) |
| 13 | + |
| 14 | +## Current Assessments |
| 15 | + |
| 16 | +- **[openapc.md](openapc.md)** - OpenAPC (Article Processing Charges) integration assessment |
| 17 | + - **Status**: Deferred / Low Priority |
| 18 | + - **Reason**: Limited coverage (5-15% of queries), weak signal for predatory detection, cost data ≠ quality indicator |
| 19 | + - **Reconsider if**: Scope expands to include cost transparency, user demand, or for research use cases |
| 20 | + |
| 21 | +## Adding New Assessments |
| 22 | + |
| 23 | +When evaluating a new potential integration: |
| 24 | + |
| 25 | +1. Create a new markdown file named after the data source (e.g., `scopus-alternative.md`, `pubpeer.md`) |
| 26 | +2. Use the OpenAPC assessment as a template structure |
| 27 | +3. Include at minimum: |
| 28 | + - Context and overview of the data source |
| 29 | + - Pros and cons analysis |
| 30 | + - Integration effort estimate |
| 31 | + - Coverage and benefit analysis |
| 32 | + - Alignment with aletheia-probe's mission |
| 33 | + - Clear recommendation with reasoning |
| 34 | + - Sources and references |
| 35 | + |
| 36 | +## Philosophy |
| 37 | + |
| 38 | +Not every available data source should be integrated. Consider: |
| 39 | + |
| 40 | +- **Mission alignment**: Does it directly support predatory journal detection? |
| 41 | +- **Coverage**: What percentage of queries benefit? |
| 42 | +- **Signal strength**: Is it direct evidence or weak/ambiguous signal? |
| 43 | +- **Maintenance burden**: Is the effort justified by the benefit? |
| 44 | +- **Alternatives**: Could existing backends be improved instead? |
| 45 | + |
| 46 | +Focus should remain on **high-impact integrations** that meaningfully improve detection accuracy for the majority of queries. |
0 commit comments