Research Paper Scorer

A configurable AI-powered tool to automatically score research papers based on custom criteria.

Features

🤖 Multi-Model AI Support - Gemini, Claude, and Groq with automatic fallbacks
📁 Hybrid File Management - Upload files or browse local folders
🎯 Configurable Scoring - Customize prompts and output formats
📊 Real-time Dashboard - Streamlit web interface with progress tracking
🔄 Resume Capability - Process can be stopped and resumed
💾 Robust Output - CSV exports with comprehensive metadata
🔍 Citation Lookup - Automatic citation count retrieval

Quick Start

1. Installation

# Clone the repository
git clone https://github.com/your-username/research-paper-scorer.git
cd research-paper-scorer

# Install dependencies
pip install -r requirements.txt

2. Configuration

# Copy environment template
cp .env.example .env

# Edit .env with your API keys
nano .env

Add your API keys to the .env file:

# Get these from respective providers
GEMINI_API_KEY=your_gemini_api_key_here
ANTHROPIC_API_KEY=your_anthropic_api_key_here
GROQ_API_KEY=your_groq_api_key_here

# Processing settings
BATCH_SIZE=3
MAX_WORKERS=2
DEFAULT_MODEL=gemini

3. Get API Keys

Gemini API: Google AI Studio
Anthropic API: Anthropic Console
Groq API: Groq Console

4. Run the Application

Web Interface (Recommended):

streamlit run app.py

Command Line Interface:

# Setup directory structure
python main.py --setup

# Process papers
python main.py --process --max-files 5

# Check status
python main.py --status

Usage

Web Interface

Upload PDFs: Use the sidebar to upload PDFs or browse local folders
Configure Scoring: Customize scoring criteria in the Configuration tab
Start Processing: Monitor real-time progress in the Processing tab
View Results: Analyze results with visualizations in the Results tab

File Management

The application organizes files into folders:

data/
├── pending/     # PDFs ready for processing
├── processing/  # Currently being processed
├── completed/   # Successfully processed
├── failed/      # Failed processing
└── outputs/     # CSV results and logs

Scoring Templates

Choose from pre-built templates or create custom ones:

General Research - Standard academic paper scoring
Medical Research - Clinical relevance, study design, ethics
Engineering - Technical innovation, validation, scalability
Social Sciences - Theory, methodology, social relevance

Project Structure

research-paper-scorer/
├── app.py                 # Streamlit web interface
├── main.py               # Command line interface
├── config.py             # Configuration settings
├── src/                  # Source code modules
│   ├── processors/       # File and paper processing
│   ├── models/          # AI model handlers
│   ├── extractors/      # PDF and citation extraction
│   ├── outputs/         # CSV handling
│   └── dashboard/       # Streamlit UI components
├── data/                # Processing directories
├── templates/           # Scoring prompts and formats
├── logs/               # Error logs
└── tests/              # Unit tests

Configuration Options

Scoring Prompts

Customize in templates/scoring_prompts/:

Modify existing templates
Create domain-specific scoring criteria
Use JSON format for structured outputs

Output Formats

Customize CSV columns in templates/output_formats/:

Basic: Essential paper information
Detailed: Complete scoring breakdown
Custom: Your own column structure

Processing Settings

Adjust in .env:

BATCH_SIZE: Papers processed simultaneously
MAX_WORKERS: Parallel processing threads
DEFAULT_MODEL: Preferred AI model

Command Line Usage

# Process all papers
python main.py --process

# Process with limits
python main.py --process --max-files 10 --batch-size 2

# Use custom scoring prompt
python main.py --process --prompt-file my_prompt.txt

# Check file status
python main.py --status

# Clean up processing folder
python main.py --cleanup

# Verbose logging
python main.py --process --verbose

API Models

Gemini (Google)

Model: gemini-1.5-flash
Rate Limits: 15 RPM, 1M TPM (free tier)
Best For: General research papers

Claude (Anthropic)

Model: claude-3-sonnet-20240229
Rate Limits: 5 RPM (free tier)
Best For: Complex analysis, nuanced scoring

Groq

Model: llama3-8b-8192
Rate Limits: Very fast inference
Best For: High-volume processing

Output Format

Generated CSV includes:

Paper Metadata: Title, authors, DOI, journal, year
Citation Data: Citation count, journal impact
Scoring Results: Individual dimension scores and justifications
Processing Info: Date, model used, processing status

Troubleshooting

Common Issues

No API keys configured:

Check your .env file exists and contains valid keys
Verify API keys have sufficient quota

PDF extraction fails:

Ensure PDFs are text-based (not scanned images)
Check file permissions and size limits

Processing stuck:

Use python main.py --cleanup to reset
Check logs/error.log for detailed errors

Citation lookup fails:

External APIs may have rate limits
Some papers may not be indexed in citation databases

Performance Tips

Start Small: Test with 2-3 papers first
Batch Size: Reduce if hitting rate limits
Worker Threads: Increase for faster processing (watch API limits)
Model Selection: Use Groq for speed, Claude for quality

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

Issues: GitHub Issues
Documentation: Project Wiki
Email: your-email@example.com

Roadmap

OCR support for scanned PDFs
More citation databases (PubMed, arXiv)
Batch export formats (Excel, JSON)
API endpoint for integration
Docker containerization
Cloud deployment options

Acknowledgments

Streamlit for the amazing web framework
Semantic Scholar for citation data
OpenAlex for paper metadata
AI model providers: Google, Anthropic, Groq

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
templates		templates
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
app.py		app.py
config.py		config.py
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Research Paper Scorer

Features

Quick Start

1. Installation

2. Configuration

3. Get API Keys

4. Run the Application

Usage

Web Interface

File Management

Scoring Templates

Project Structure

Configuration Options

Scoring Prompts

Output Formats

Processing Settings

Command Line Usage

API Models

Gemini (Google)

Claude (Anthropic)

Groq

Output Format

Troubleshooting

Common Issues

Performance Tips

Contributing

License

Support

Roadmap

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages