Research-Paper-Summarizer

RAG + AI + PDF Processing

🔍 Problem Statement

Research papers are often long and complex, making it difficult for researchers, students, and professionals to extract key insights efficiently. This tool automates the summarization process using Retrieval-Augmented Generation (RAG) and Generative AI models, providing quick and accurate summaries along with essential keyword extraction.

🔗 How It Works (Completed ✅)

Input Options:

✅ Users can upload a PDF or provide a URL to a research paper.

Text Extraction:

✅ For PDFs: The system extracts text using PyPDF2.
✅ For URLs: The system scrapes text from web pages using BeautifulSoup.

Processing:

✅ Extracted text is sent to the FastAPI backend.
✅ Keywords are extracted using TF-IDF.
✅ FAISS/ChromaDB stores embeddings for efficient retrieval.

Output:

✅ Summarized text and extracted keywords are displayed in the frontend.

🛠 Tech Stack

Frontend: Streamlit
Backend: FastAPI
ML Models (Planned): Llama-2 / BART / Pegasus for text summarization
Database: FAISS / ChromaDB (Vector-based retrieval)
Libraries: PyPDF2, Hugging Face Transformers, Scikit-learn, BeautifulSoup, gdown

🚀 Features (Completed Work)

✅ Keyword extraction using TF-IDF
✅ Supports direct PDF uploads & Web links
✅ FAISS Vector Search for fast retrieval of stored summaries
✅ Text Extraction & Processing (FastAPI, PyPDF2, BeautifulSoup, FAISS/ChromaDB)

⚡ Scalability & Complexity

Keyword Extraction: O(n log n) (TF-IDF)
FAISS Query Retrieval: O(log n) (Fast vector search)
API-Based Processing: Allows easy scaling to handle large research datasets

🏗️ Upcoming Work

🔹 Fix Frontend-Backend Connection: Debug Streamlit & FastAPI communication issues
🔹 Enhance Frontend UI: Improve responsiveness, add visual elements for keyword highlights
🔹 Implement Summarization Models: Integrate Llama-2, BART, or Pegasus for text summarization
🔹 Implement Query-based Retrieval: Allow users to search for related papers in the FAISS database
🔹 Support for More File Formats: Extend support beyond PDFs (e.g., DOCX, TXT)
🔹 Optimize Batch Processing: Implement parallelization for faster summarization of large datasets
🔹 Deploy the Application: Deploy the project on Hugging Face Spaces or Render for public access
🔹 Make Local Processing Feasible: Optimize LLM inference locally while allowing scalable API-based processing

🌍 Impact

🔹 Revolutionizing research accessibility with RAG-based summarization
🔹 Saves hours of reading time for researchers & students
🔹 Useful for academics, journalists, legal analysts & enterprise research platforms
🔹 Scalable to millions of research papers with optimized retrieval and batch processing
🔹 Improves productivity by enabling faster literature reviews & knowledge discovery

📂 Project Structure

📂 Research-Paper-Summarizer  
│── 📂 backend  
│   │── summarizer.py         # Handles text summarization  
│   │── keyword_extractor.py  # Extracts keywords from the paper  
│   │── fetch_paper.py        # Fetches research paper from URL/Drive  
│   │── main.py               # FastAPI backend  
│  
│── 📂 frontend  
│   │── app.py                # Streamlit UI (main app)  
│   │── ui_components.py      # UI components (sidebar, upload, results)  
│  
│── 📂 models  
│   │── faiss_index           # FAISS Vector Database  
│   │── model.pth             # Trained ML Model (optional)  
│  
│── 📂 data  
│   │── example_papers/       # Store local PDF research papers  
│  
│── requirements.txt          # Dependencies  
│── README.md                 # Documentation  


🚀🌍   Conclusion :
🔹 Saves hours of reading time for researchers & students.
🔹 Can be used for academic institutions, journalists, or legal analysis.
🔹 Scalable to millions of papers for enterprise research platforms.

### **🌍 Final Note:**  
**Work is in progress... 🚧**

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
backend		backend
data		data
frontend		frontend
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirement.txt		requirement.txt
run_backend.sh		run_backend.sh
run_frontend.sh		run_frontend.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Research-Paper-Summarizer

RAG + AI + PDF Processing

🔍 Problem Statement

🔗 How It Works (Completed ✅)

Input Options:

Text Extraction:

Processing:

Output:

🛠 Tech Stack

🚀 Features (Completed Work)

⚡ Scalability & Complexity

🏗️ Upcoming Work

🌍 Impact

📂 Project Structure

About

Uh oh!

Releases

Packages

Languages

License

anushka-cseatmnc/Research-Paper-Summarizer

Folders and files

Latest commit

History

Repository files navigation

Research-Paper-Summarizer

RAG + AI + PDF Processing

🔍 Problem Statement

🔗 How It Works (Completed ✅)

Input Options:

Text Extraction:

Processing:

Output:

🛠 Tech Stack

🚀 Features (Completed Work)

⚡ Scalability & Complexity

🏗️ Upcoming Work

🌍 Impact

📂 Project Structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages