A Flask-based web application that detects peer-to-peer plagiarism between student submissions using NLP techniques.
The system compares uploaded content and computes plagiarism scores with TF-IDF vectorization and cosine similarity, helping educators identify suspiciously similar submissions efficiently.
- Live Demo
- Project Overview
- Features
- Tech Stack
- Project Structure
- Installation
- Usage
- Deployment on Render
- Google Classroom Integration (Optional)
- Working Principle
- Future Improvements
- Contributing
- License
Application Link:
https://peertopeerplagrismdetector.onrender.com
This project focuses on peer-to-peer similarity detection in classroom submissions.
Unlike internet-wide plagiarism tools, it is designed to detect copying between students in the same course workflow. It supports PDF text extraction and can be extended with Google Classroom for automated assignment retrieval.
- Upload and compare multiple documents
- Automatic text extraction from PDF files
- Text cleaning and preprocessing pipeline
- TF-IDF based feature extraction
- Cosine similarity score calculation
- Plagiarism percentage display
- Clean, responsive teacher dashboard
- Optional Google Classroom integration
- Cloud deployment support via Render
- Python
- Flask
- HTML
- CSS
- JavaScript
- scikit-learn (TF-IDF, cosine similarity)
- NumPy
- PyPDF2
- Google OAuth 2.0
- Google Classroom API
- Google Drive API
- Gunicorn
- Render
PeerToPeerPlagrismDetector/
├── app.py
├── requirements.txt
├── Procfile
├── templates/
├── static/
└── README.md
- Python 3.x
- pip
git clone https://github.com/Utkarsh-rwt/PeerToPeerPlagrismDetector.git
cd PeerToPeerPlagrismDetectorpython -m venv venvWindows
venv\Scripts\activatemacOS/Linux
source venv/bin/activatepip install -r requirements.txtpython app.pyOpen in browser:
http://127.0.0.1:5000
- Open the app home page.
- Sign in through the Google login flow (optional for Classroom-based workflow).
- Select course and assignment (when using Classroom integration).
- Fetch submissions and run similarity analysis.
- Review generated plagiarism percentages and matched student pairs.
Ensure these files exist in the repository root:
requirements.txtProcfilewith:
web: gunicorn app:app
git add .
git commit -m "deploy app"
git push- Go to Render and sign in with GitHub.
- Create a New Web Service.
- Select this repository.
- Set build command:
pip install -r requirements.txt- Set start command:
gunicorn app:app- Click Deploy.
For OAuth integration, update your redirect URI in Google Cloud Console:
https://your-app-name.onrender.com/oauth2callback
Also ensure client_secret.json is configured correctly for your Google Cloud project.
The plagiarism detection workflow:
- Extract text from uploaded documents
- Clean and normalize text
- Convert text into TF-IDF vectors
- Compute cosine similarity
- Generate plagiarism percentage scores
Similarity formula:
cos(θ) = (A · B) / (||A|| ||B||)
- AI-based semantic similarity detection
- Teacher analytics dashboard
- Report export system
- Classroom-wide live sync
- Historical plagiarism database
- Student submission insights
Pull requests are welcome.
For major changes, please open an issue first to discuss your proposed improvements.
This project is intended for educational and academic use.
A dedicated open-source license file is not currently included in the repository.