GitHub - Darsh-14/DePlag--A-Web-Based-Explainable-NLP-System-for-Detecting-Paraphrased-Plagiarism: DePlag is a next-generation hybrid academic integrity framework that moves beyond simple string matching. By fusing the semantic depth of Sentence-T5 Transformers with traditional lexical overlap metrics, DePlag identifies sophisticated "Adversarial Paraphrasing" and allows users with counterfactual reasoning.

DePlag: Hybrid & Explainable Paraphrase Detection DePlag is a next-generation academic integrity framework that moves beyond simple string matching. By fusing the semantic depth of Sentence-T5 Transformers with traditional lexical overlap metrics, DePlag identifies sophisticated "Adversarial Paraphrasing" while providing students and educators with transparent, actionable feedback.

🚀 The Core Philosophy Traditional plagiarism detectors are often "Black Boxes" that provide a percentage score without context. This leads to a Trust Gap between students and institutions. DePlag solves this by:

Detecting Meaning, Not Just Words: Using sentence-t5-base to capture conceptual similarity even when every word has been changed.

Explainable AI (XAI): Highlighting specific shared entities and linguistic patterns using spaCy NER.

Pedagogical Feedback: An "Ethical Advisory" system that coaches users on whether they have committed "Patchwriting," "Structural Plagiarism," or "Conceptual Overlap."

🛠️ Technical Architecture The system utilizes a dual-stream architecture to ensure both high precision and forensic transparency:

The Semantic Brain: Utilizes a Sentence-T5 model to map sentences into a 768-dimensional vector space. Similarity is calculated using a custom NumPy-optimized Cosine Similarity implementation.

The Lexical Stream: A high-speed Jaccard Similarity engine that tracks word-for-word overlap to identify "copy-paste" or "near-verbatim" theft.

The Neural Classifier: A Logistic Regression model trained on 15,000 samples (from MSRP and PAWS benchmarks) that learns to weight the semantic and lexical signals to make a final "Plagiarism" determination.

Forensic NER: Integrated spaCy pipelines to extract and compare Named Entities, ensuring that critical technical terms and proper nouns are tracked across versions.

📦 Installation & Setup

Clone and Install Bash

git clone https://github.com/yourusername/DePlag.git cd DePlag pip install -r requirements.txt 2. Download Linguistic Models Bash

python -m spacy download en_core_web_sm 3. Initialize the Engine Before running the app, you must train the classifier head using the provided dataset:

Bash

python train_semantic.py python app.py 🧩 Key Features (As seen in app.py) Real-time Inference: Powered by a Flask microservice with pre-loaded models for low-latency scoring.

Entity Evidence: Extracts and displays shared technical terms to prove similarity.

Ethical Advisory Logic: * Score > 0.85 (Semantic): Flags "Structural Plagiarism."

Overlap > 0.3 (Lexical): Flags "Patchwriting."

Lower Scores: Distinguishes between "Conceptual Similarity" and "Original Content."

🛡️ Privacy & Ethics Security: This repository includes a .gitignore to protect sensitive files like kaggle.json and local .pkl models.

Goal: DePlag is intended as a writing assistant to help students learn proper attribution, rather than a purely punitive tool.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
models		models
templates		templates
README.md		README.md
app.py		app.py
benchmark_models.py		benchmark_models.py
fine_tune_brain.py		fine_tune_brain.py
merge_all_final.py		merge_all_final.py
requirements.txt		requirements.txt
setup_data.py		setup_data.py
train_model.py		train_model.py
train_semantic.py		train_semantic.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages