Skip to content

SegunAdewola/research-paper-scorer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Research Paper Scorer

A configurable AI-powered tool to automatically score research papers based on custom criteria.

Features

  • πŸ€– Multi-Model AI Support - Gemini, Claude, and Groq with automatic fallbacks
  • πŸ“ Hybrid File Management - Upload files or browse local folders
  • 🎯 Configurable Scoring - Customize prompts and output formats
  • πŸ“Š Real-time Dashboard - Streamlit web interface with progress tracking
  • πŸ”„ Resume Capability - Process can be stopped and resumed
  • πŸ’Ύ Robust Output - CSV exports with comprehensive metadata
  • πŸ” Citation Lookup - Automatic citation count retrieval

Quick Start

1. Installation

# Clone the repository
git clone https://github.com/your-username/research-paper-scorer.git
cd research-paper-scorer

# Install dependencies
pip install -r requirements.txt

2. Configuration

# Copy environment template
cp .env.example .env

# Edit .env with your API keys
nano .env

Add your API keys to the .env file:

# Get these from respective providers
GEMINI_API_KEY=your_gemini_api_key_here
ANTHROPIC_API_KEY=your_anthropic_api_key_here
GROQ_API_KEY=your_groq_api_key_here

# Processing settings
BATCH_SIZE=3
MAX_WORKERS=2
DEFAULT_MODEL=gemini

3. Get API Keys

4. Run the Application

Web Interface (Recommended):

streamlit run app.py

Command Line Interface:

# Setup directory structure
python main.py --setup

# Process papers
python main.py --process --max-files 5

# Check status
python main.py --status

Usage

Web Interface

  1. Upload PDFs: Use the sidebar to upload PDFs or browse local folders
  2. Configure Scoring: Customize scoring criteria in the Configuration tab
  3. Start Processing: Monitor real-time progress in the Processing tab
  4. View Results: Analyze results with visualizations in the Results tab

File Management

The application organizes files into folders:

data/
β”œβ”€β”€ pending/     # PDFs ready for processing
β”œβ”€β”€ processing/  # Currently being processed
β”œβ”€β”€ completed/   # Successfully processed
β”œβ”€β”€ failed/      # Failed processing
└── outputs/     # CSV results and logs

Scoring Templates

Choose from pre-built templates or create custom ones:

  • General Research - Standard academic paper scoring
  • Medical Research - Clinical relevance, study design, ethics
  • Engineering - Technical innovation, validation, scalability
  • Social Sciences - Theory, methodology, social relevance

Project Structure

research-paper-scorer/
β”œβ”€β”€ app.py                 # Streamlit web interface
β”œβ”€β”€ main.py               # Command line interface
β”œβ”€β”€ config.py             # Configuration settings
β”œβ”€β”€ src/                  # Source code modules
β”‚   β”œβ”€β”€ processors/       # File and paper processing
β”‚   β”œβ”€β”€ models/          # AI model handlers
β”‚   β”œβ”€β”€ extractors/      # PDF and citation extraction
β”‚   β”œβ”€β”€ outputs/         # CSV handling
β”‚   └── dashboard/       # Streamlit UI components
β”œβ”€β”€ data/                # Processing directories
β”œβ”€β”€ templates/           # Scoring prompts and formats
β”œβ”€β”€ logs/               # Error logs
└── tests/              # Unit tests

Configuration Options

Scoring Prompts

Customize in templates/scoring_prompts/:

  • Modify existing templates
  • Create domain-specific scoring criteria
  • Use JSON format for structured outputs

Output Formats

Customize CSV columns in templates/output_formats/:

  • Basic: Essential paper information
  • Detailed: Complete scoring breakdown
  • Custom: Your own column structure

Processing Settings

Adjust in .env:

  • BATCH_SIZE: Papers processed simultaneously
  • MAX_WORKERS: Parallel processing threads
  • DEFAULT_MODEL: Preferred AI model

Command Line Usage

# Process all papers
python main.py --process

# Process with limits
python main.py --process --max-files 10 --batch-size 2

# Use custom scoring prompt
python main.py --process --prompt-file my_prompt.txt

# Check file status
python main.py --status

# Clean up processing folder
python main.py --cleanup

# Verbose logging
python main.py --process --verbose

API Models

Gemini (Google)

  • Model: gemini-1.5-flash
  • Rate Limits: 15 RPM, 1M TPM (free tier)
  • Best For: General research papers

Claude (Anthropic)

  • Model: claude-3-sonnet-20240229
  • Rate Limits: 5 RPM (free tier)
  • Best For: Complex analysis, nuanced scoring

Groq

  • Model: llama3-8b-8192
  • Rate Limits: Very fast inference
  • Best For: High-volume processing

Output Format

Generated CSV includes:

  • Paper Metadata: Title, authors, DOI, journal, year
  • Citation Data: Citation count, journal impact
  • Scoring Results: Individual dimension scores and justifications
  • Processing Info: Date, model used, processing status

Troubleshooting

Common Issues

No API keys configured:

  • Check your .env file exists and contains valid keys
  • Verify API keys have sufficient quota

PDF extraction fails:

  • Ensure PDFs are text-based (not scanned images)
  • Check file permissions and size limits

Processing stuck:

  • Use python main.py --cleanup to reset
  • Check logs/error.log for detailed errors

Citation lookup fails:

  • External APIs may have rate limits
  • Some papers may not be indexed in citation databases

Performance Tips

  1. Start Small: Test with 2-3 papers first
  2. Batch Size: Reduce if hitting rate limits
  3. Worker Threads: Increase for faster processing (watch API limits)
  4. Model Selection: Use Groq for speed, Claude for quality

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

Roadmap

  • OCR support for scanned PDFs
  • More citation databases (PubMed, arXiv)
  • Batch export formats (Excel, JSON)
  • API endpoint for integration
  • Docker containerization
  • Cloud deployment options

Acknowledgments

About

AI-powered research paper scoring platform

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages