A Python tool to batch download research output files from the Elsevier Pure API using a CSV list of Pure IDs.
- ✅ Batch download files from Pure API using CSV input
- ✅ Supports both numeric Pure IDs and UUIDs
- ✅ Automatic file type detection (PDF, DOCX, etc.)
- ✅ Configurable download limits for testing
- ✅ Comprehensive error handling and logging
- ✅ CSV encoding auto-detection (UTF-8, Windows cp1252, Latin-1)
- ✅ Secure configuration management
# First time setup - configure your API credentials
python setup_config.pyThis will prompt you for:
- Your Pure API key
- Your institution's Pure API URL
- CSV file path
- Download settings
python download_pure_file.pyThe script will:
- Test API connection
- Load Pure IDs from your CSV
- Download all attached files to the
downloads/directory
pure_downloader/
├── config.py # Your configuration (gitignored - contains API key)
├── config.template.py # Template for creating config.py
├── download_pure_file.py # Main downloader script
├── setup_config.py # Interactive configuration utility
├── .gitignore # Protects sensitive config files
├── README.md # This file
└── downloads/ # Downloaded files go here (created automatically)
python setup_config.py-
Copy the template:
copy config.template.py config.py
-
Edit
config.pywith your settings:PURE_API_KEY = "your-api-key-here" BASE_API_URL = "https://yourinstitution.elsevierpure.com/ws/api" CSV_FILE_PATH = "example.csv" MAX_DOWNLOADS = None # Or set to a number for testing
| Setting | Description | Default |
|---|---|---|
PURE_API_KEY |
Your Pure API key | Required |
BASE_API_URL |
Your Pure API endpoint | Required |
CSV_FILE_PATH |
Path to CSV with Pure IDs | "your_file.csv" |
ID_COLUMN |
CSV column with IDs | "Pure ID" |
OUTPUT_DIRECTORY |
Where to save files | "downloads" |
MAX_DOWNLOADS |
Limit for testing | None (all entries, or set to number for testing) |
DOWNLOAD_FILE_TYPES |
Filter file types | ['.pdf', '.docx', '.doc'] |
REQUEST_TIMEOUT |
API timeout seconds | 300 |
DOWNLOAD_CHUNK_SIZE |
Streaming chunk size | 8192 |
Your CSV file should have a column named "Pure ID" containing Pure IDs (it can be any column):
Pure ID,Title,Year
27139086,"Forest Protection Research",2023
46773789,"Cypress Stakes Study",2022
14344978,"Genetic Resources",2021Supported ID formats:
- Numeric Pure IDs:
27139086,46773789 - UUIDs:
12345678-1234-5678-1234-567812345678
For testing, set MAX_DOWNLOADS to a small number:
# In config.py
MAX_DOWNLOADS = 3 # Download only first 3 entries# In config.py
MAX_DOWNLOADS = None # Download everythingTo only download specific file types:
# In config.py
DOWNLOAD_FILE_TYPES = [".pdf", ".docx"] # Only PDFs and Word docs-
ID Resolution: The script accepts numeric Pure IDs or UUIDs
- Numeric IDs are automatically converted to UUIDs via API
-
File Discovery: Files are extracted from the
electronicVersionsfield- Not from
/filesendpoint (Pure API quirk)
- Not from
-
Download: Files are streamed in chunks to handle large files efficiently
-
Naming: Files are saved with sanitized titles and original extensions
- API Key Protection:
config.pyis automatically gitignored - Template Provided:
config.template.pyshows structure without sensitive data - Never commit your actual
config.pyto version control
# Validate current configuration
python -c "import config; print(config.validate_config())"
# Reconfigure interactively
python setup_config.py- Check
PURE_API_KEYis correct inconfig.py - Verify
BASE_API_URLformat:https://[institution].elsevierpure.com/ws/api - Test network connectivity
- Contact Pure administrator for API access
The script auto-detects encoding (UTF-8, cp1252, Latin-1, ISO-8859-1). If issues persist:
- Try re-exporting CSV from Pure with UTF-8 encoding
- Check for special characters in titles
- Verify the Pure ID exists and has attached files
- Check you have permission to access the research output
- Use
search_by_id.pyto inspect the full API response
GET /research-outputs/{id}- Get research output by numeric IDGET /research-outputs/{uuid}- Get research output by UUIDGET /research-outputs/{uuid}/files/{fileId}/{filename}- Download file
The API uses API key authentication via the api_key query parameter.
Test files are located in tests/:
cd tests
python run_tests.py