Skip to content

AmirhosseinHonardoust/Fake-Review-Detector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fake Review Detector (NLP + Streamlit)

A machine learning project that detects fake vs real product reviews using TF-IDF vectorization, Logistic Regression, and behavioral text features such as exclamation count, sentiment, and repeated promotional phrases.
It also includes a sleek Streamlit app for interactive real-time predictions.


Features

  • Text cleaning and normalization pipeline
  • Hybrid feature extraction:
    • TF-IDF (1–2 grams)
    • Numeric sentiment & behavioral features
  • Interpretable Logistic Regression model
  • Evaluation metrics: Confusion Matrix, ROC, and PR curves
  • Interactive Streamlit app with adjustable decision threshold

Folder Structure

fake-review-detector/
├── app/
│   └── streamlit_app.py
├── src/
│   ├── clean_text.py
│   ├── features.py
│   ├── train.py
│   └── predict.py
├── data/
│   └── reviews_sample.csv
├── outputs/
│   ├── pipeline.joblib
│   ├── confusion_matrix.png
│   ├── roc_curve.png
│   └── pr_curve.png
├── requirements.txt
└── README.md

How It Works

  1. Data Input: CSV containing text and label columns.
  2. Preprocessing: URL, punctuation, and HTML removal + lowercasing.
  3. Feature Engineering:
    • Sentiment score
    • Exclamation & ALL-CAPS detection
    • Fake-review clichés (e.g., “best product ever”)
  4. Modeling: Logistic Regression trained on combined features.
  5. Prediction: Threshold-tunable classification for FAKE vs REAL.

Streamlit Interface

Below is a preview of the web app UI built with Streamlit:

Screenshot 2025-10-30 at 11-15-11 Fake Review Detector

Highlights

  • Paste or type any review text.
  • Adjust decision threshold for sensitivity.
  • Get immediate prediction with fake probability.
  • Built-in tips to help interpret the model.

Model Evaluation

Confusion Matrix

confusion_matrix

Precision-Recall Curve

pr_curve

ROC Curve

roc_curve

The model achieves AUC ≈ 1.00 and AP ≈ 1.00 on sample data (balanced, synthetic).


Setup & Usage

python -m venv .venv
# Activate
.venv\Scripts\activate  # (Windows)
# source .venv/bin/activate  # (macOS/Linux)

pip install -r requirements.txt

# Train the model
python src/train.py --csv data/reviews_sample.csv --outdir outputs

# Predict a single review
python src/predict.py --pipeline outputs/pipeline.joblib --text "I got this for free, best product ever!!!"

# Launch the app
streamlit run app/streamlit_app.py

Insights

  • Excessive punctuation, emotional exaggeration, or ALL-CAPS usage strongly correlates with fake reviews.
  • Real reviews tend to include neutral tone and product-specific feedback.
  • The combination of linguistic + behavioral features improves reliability over text-only models.

Future Improvements

  • Integrate a larger, real-world labeled dataset.
  • Replace TF-IDF with contextual embeddings (BERT/SentenceTransformer).
  • Deploy via Streamlit Cloud or Hugging Face Spaces.
  • Add explainability (SHAP/LIME) for feature-level insights.

About

An AI-powered Fake Review Detector built with Python, Streamlit, and Scikit-learn. Uses TF-IDF vectorization, Logistic Regression, and behavioral text analytics (sentiment, exclamations, clichés) to identify synthetic or spammy product reviews. Includes training scripts and a full interactive dashboard.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages