- Kavana H – SRN: PES1UG23CS292
- Keshav Singhal – SRN: PES1UG23CS299
The goal of this project is to detect real vs fake news in Chinese Weibo microblog posts using machine learning and deep learning models.
We implemented and compared several models:
- Naive Bayes
- Logistic Regression
- Support Vector Machine (SVM)
- Artificial Neural Network (ANN) (using TensorFlow)
- Tokenization using jieba
- Stopword removal
- TF-IDF vectorization (for classical ML models)
- Sequence embedding (for ANN model)
- Weibo Microblog Posts Dataset
- Dataset Link: Weibo Dataset on GitHub
| Model | Accuracy |
|---|---|
| Naive Bayes | 0.862 |
| Logistic Regression | 0.853 |
| SVM | 0.857 |
| ANN | 0.8213 |
git clone https://github.com/Kavana-coder/Weibo-Fake-News-Detection
cd Weibo-Fake-News-Detection
2️⃣ Set Up Python Environment
Ensure you have Python 3.8+ installed.
pip install -r requirements.txt
Running the Project
Step 1: Data Preprocessing & Model Training
Run the main notebook:
jupyter notebook main.ipynb
This notebook will:
Load and clean the Weibo dataset
Perform text preprocessing (tokenization, stopword removal)
Train Naive Bayes, Logistic Regression, SVM, and ANN models
Automatically save trained model files:
svm_model.pkl
tfidf_vectorizer.pkl
ann_model.h5
tokenizer.json
results.csv
Step 2: Generate Results
After training, the notebook generates results.csv containing predictions from all models.
Step 3: Run the GUI Application
Launch the Tkinter GUI for real-time predictions:
python gui_app.py
GUI Features:
Enter a Weibo post text
Predict Real or Fake using both SVM and ANN
Display predictions instantly
Notes
The following files are required for GUI predictions:
svm_model.pkl,
tfidf_vectorizer.pkl,
ann_model.h5,
tokenizer.json,
results.csv.
These files are generated automatically after running main.ipynb.