Pure API File Downloader

A Python tool to batch download research output files from the Elsevier Pure API using a CSV list of Pure IDs.

📋 Features

✅ Batch download files from Pure API using CSV input
✅ Supports both numeric Pure IDs and UUIDs
✅ Automatic file type detection (PDF, DOCX, etc.)
✅ Configurable download limits for testing
✅ Comprehensive error handling and logging
✅ CSV encoding auto-detection (UTF-8, Windows cp1252, Latin-1)
✅ Secure configuration management

🚀 Quick Start

1. Initial Setup

# First time setup - configure your API credentials
python setup_config.py

This will prompt you for:

Your Pure API key
Your institution's Pure API URL
CSV file path
Download settings

2. Run the Downloader

python download_pure_file.py

The script will:

Test API connection
Load Pure IDs from your CSV
Download all attached files to the downloads/ directory

📁 File Structure

pure_downloader/
├── config.py              # Your configuration (gitignored - contains API key)
├── config.template.py     # Template for creating config.py
├── download_pure_file.py  # Main downloader script
├── setup_config.py        # Interactive configuration utility
├── .gitignore            # Protects sensitive config files
├── README.md             # This file
└── downloads/            # Downloaded files go here (created automatically)

⚙️ Configuration

Option 1: Interactive Setup (Recommended)

python setup_config.py

Option 2: Manual Configuration

Copy the template:
```
copy config.template.py config.py
```

Edit config.py with your settings:

PURE_API_KEY = "your-api-key-here"
BASE_API_URL = "https://yourinstitution.elsevierpure.com/ws/api"
CSV_FILE_PATH = "example.csv"
MAX_DOWNLOADS = None  # Or set to a number for testing

Configuration Options

Setting	Description	Default
`PURE_API_KEY`	Your Pure API key	Required
`BASE_API_URL`	Your Pure API endpoint	Required
`CSV_FILE_PATH`	Path to CSV with Pure IDs	`"your_file.csv"`
`ID_COLUMN`	CSV column with IDs	`"Pure ID"`
`OUTPUT_DIRECTORY`	Where to save files	`"downloads"`
`MAX_DOWNLOADS`	Limit for testing	`None` (all entries, or set to number for testing)
`DOWNLOAD_FILE_TYPES`	Filter file types	`['.pdf', '.docx', '.doc']`
`REQUEST_TIMEOUT`	API timeout seconds	`300`
`DOWNLOAD_CHUNK_SIZE`	Streaming chunk size	`8192`

📊 CSV Format

Your CSV file should have a column named "Pure ID" containing Pure IDs (it can be any column):

Pure ID,Title,Year
27139086,"Forest Protection Research",2023
46773789,"Cypress Stakes Study",2022
14344978,"Genetic Resources",2021

Supported ID formats:

Numeric Pure IDs: 27139086, 46773789
UUIDs: 12345678-1234-5678-1234-567812345678

🔧 Advanced Usage

Test with Limited Downloads

For testing, set MAX_DOWNLOADS to a small number:

# In config.py
MAX_DOWNLOADS = 3  # Download only first 3 entries

Download All Entries

# In config.py
MAX_DOWNLOADS = None  # Download everything

Filter File Types

To only download specific file types:

# In config.py
DOWNLOAD_FILE_TYPES = [".pdf", ".docx"]  # Only PDFs and Word docs

🔍 How It Works

ID Resolution: The script accepts numeric Pure IDs or UUIDs
- Numeric IDs are automatically converted to UUIDs via API
File Discovery: Files are extracted from the electronicVersions field
- Not from /files endpoint (Pure API quirk)
Download: Files are streamed in chunks to handle large files efficiently
Naming: Files are saved with sanitized titles and original extensions

🛡️ Security

API Key Protection: config.py is automatically gitignored
Template Provided: config.template.py shows structure without sensitive data
Never commit your actual config.py to version control

🐛 Troubleshooting

Configuration Issues

# Validate current configuration
python -c "import config; print(config.validate_config())"

# Reconfigure interactively
python setup_config.py

API Connection Failed

Check PURE_API_KEY is correct in config.py
Verify BASE_API_URL format: https://[institution].elsevierpure.com/ws/api
Test network connectivity
Contact Pure administrator for API access

CSV Encoding Errors

The script auto-detects encoding (UTF-8, cp1252, Latin-1, ISO-8859-1). If issues persist:

Try re-exporting CSV from Pure with UTF-8 encoding
Check for special characters in titles

No Files Found (404 Errors)

Verify the Pure ID exists and has attached files
Check you have permission to access the research output
Use search_by_id.py to inspect the full API response

📝 API Reference

Pure API Endpoints Used

GET /research-outputs/{id} - Get research output by numeric ID
GET /research-outputs/{uuid} - Get research output by UUID
GET /research-outputs/{uuid}/files/{fileId}/{filename} - Download file

Authentication

The API uses API key authentication via the api_key query parameter.

🧪 Testing

Test files are located in tests/:

cd tests
python run_tests.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pure API File Downloader

📋 Features

🚀 Quick Start

1. Initial Setup

2. Run the Downloader

📁 File Structure

⚙️ Configuration

Option 1: Interactive Setup (Recommended)

Option 2: Manual Configuration

Configuration Options

📊 CSV Format

🔧 Advanced Usage

Test with Limited Downloads

Download All Entries

Filter File Types

🔍 How It Works

🛡️ Security

🐛 Troubleshooting

Configuration Issues

API Connection Failed

CSV Encoding Errors

No Files Found (404 Errors)

📝 API Reference

Pure API Endpoints Used

Authentication

🧪 Testing

🔗 Resources

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
tests		tests
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
config.template.py		config.template.py
download_pure_file.py		download_pure_file.py
example.csv		example.csv
pure_api_cheatsheet.md		pure_api_cheatsheet.md
setup_config.py		setup_config.py

ScionResearch/pure-api-downloader

Folders and files

Latest commit

History

Repository files navigation

Pure API File Downloader

📋 Features

🚀 Quick Start

1. Initial Setup

2. Run the Downloader

📁 File Structure

⚙️ Configuration

Option 1: Interactive Setup (Recommended)

Option 2: Manual Configuration

Configuration Options

📊 CSV Format

🔧 Advanced Usage

Test with Limited Downloads

Download All Entries

Filter File Types

🔍 How It Works

🛡️ Security

🐛 Troubleshooting

Configuration Issues

API Connection Failed

CSV Encoding Errors

No Files Found (404 Errors)

📝 API Reference

Pure API Endpoints Used

Authentication

🧪 Testing

🔗 Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages