OJS Open Access Scanner

A tool to automatically detect open access status of journals using Open Journal Systems (OJS) by analyzing their current issue pages.

Overview

This project scans OJS journal websites to determine if they provide open access to their articles. It analyzes the current issue page of each journal, looking for PDF download links and toll-access indicators to classify journals as open access, toll access, or unknown.

Files

Scripts

ojs_access_scan.py - Main scanning script that processes journal URLs and determines open access status

Data Files

journals.csv - Input file containing ISSN and homepage URL for each journal (52K+ journals)
ojs_oa.csv - Main results file with ISSN, current issue URL, and open access status
ojs_oa_diagnostics.csv - Detailed diagnostic information including response times, errors, and detection details

Results Data Structure

ojs_oa.csv contains:

issn - Journal ISSN
current_issue_url - URL of the journal's current issue page
is_oa - Open access status (true, false, or unknown)

ojs_oa_diagnostics.csv contains additional diagnostic data:

status - HTTP response status or error type
final_url - Final URL after redirects
elapsed_ms - Response time in milliseconds
bytes_checked - Number of bytes analyzed
matched - Detection method used
on_current_page - Whether analysis was performed on current issue page
error - Error details if request failed

Detection Method

The scanner uses a sophisticated detection algorithm:

Fetch Current Issue Page: Attempts to access /issue/current endpoint for each journal
Find Article Links: Searches for links containing /article/view/ patterns
Analyze Access: For each article, checks for:
- PDF galley links (indicates open access)
- Toll access indicators (subscription/restricted keywords)
- Access badges and icons

Usage

# Install dependencies
pip install aiohttp beautifulsoup4

# Run the scanner
python ojs_access_scan.py journals.csv

Configuration

Key parameters in the script:

GLOBAL_CONCURRENCY - Maximum concurrent requests (default: 10)
PER_HOST_CONCURRENCY - Requests per host (default: 1)
MAX_BYTES - Maximum response size to analyze (256KB)
RETRIES - Number of retry attempts for failed requests (3)

Results Summary

From the scan of 52K+ OJS journals:

Open Access: Journals providing free PDF access to articles
Toll Access: Journals requiring subscriptions or payments
Unknown: Journals where access status couldn't be determined (offline, errors, etc.)

The diagnostic file provides detailed information about scan performance, response times, and reasons for classification decisions.

Network Considerations

The scanner is designed to be respectful of server resources:

Conservative concurrency limits
Retry logic with exponential backoff
Per-host connection limiting
User-agent headers for identification

Requirements

Python 3.7+
aiohttp
beautifulsoup4

License

Data and code provided for research purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
journals.csv		journals.csv
ojs_access_scan.py		ojs_access_scan.py
ojs_oa.csv		ojs_oa.csv
ojs_oa_diagnostics.csv		ojs_oa_diagnostics.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OJS Open Access Scanner

Overview

Files

Scripts

Data Files

Results Data Structure

Detection Method

Usage

Configuration

Results Summary

Network Considerations

Requirements

License

About

Uh oh!

Releases

Packages

Languages

ourresearch/ojs-open-access-scanner

Folders and files

Latest commit

History

Repository files navigation

OJS Open Access Scanner

Overview

Files

Scripts

Data Files

Results Data Structure

Detection Method

Usage

Configuration

Results Summary

Network Considerations

Requirements

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages