GitHub - VishalDhariwal/Fakeddit-Multimodal-FakeNews: Using CLIP ViT-B/32 embeddings and a lightweight classifier (XGBoost), this project achieved 93.92% accuracy on the Fakeddit 2-way classification task — a result that is competitive with or slightly ahead of several state-of-the-art benchmarks using more complex multimodal fusion techniques.

📰 Fakeddit Multimodal Fake News Detection (fakeddit_models) This project develops and compares multiple deep learning models for multimodal fake news detection using the Fakeddit dataset, which contains both text and image modalities.

The work explores a range of architectures — from traditional CNN-LSTM fusion to OpenAI’s CLIP — and evaluates different fusion strategies to achieve the best possible accuracy.

The best configuration achieved 93.9% accuracy, using CLIP (ViT-B/32) embeddings combined with other metadata and an XGBoost classifier.

📚 Overview The project was developed entirely in Google Colab using .ipynb notebooks. Each notebook represents a key stage in the research and experimentation process.

📂 Notebooks Summary Notebook Description

01_resnet_img_processing.ipynb Extracted image embeddings using ResNet18 for each Fakeddit post.

02_roberta.ipynb Generated text embeddings using BERT (later replaced by RoBERTa for improved context understanding).

03_resnet.ipynb Experimented with additional ResNet architectures and fine-tuning image feature extraction.

04_DeepFusionNet_resNet_train.ipynb Trained various fusion-based models (CNN-LSTM, MLP fusion) using pre-extracted embeddings. Compared performance metrics.

05_clip.ipynb Used CLIP ViT-B/32 for joint image–text feature extraction and evaluated multimodal representations.

06_fusion_experiments.ipynb Explored different fusion techniques (weighted averaging, concatenation, attention-based) with various weight combinations.

07_xgboost_author_features.ipynb Added author and subreddit metadata as input features. Subreddit caused overfitting (accuracy=1.0), so final experiments used author features + other metadata + CLIP embeddings with XGBoost, resulting in 93.9% accuracy.

🧠 Key Insights Initial models used BERT + ResNet18, giving strong baseline results. CLIP-based embeddings provided superior joint understanding of visual and textual modalities. Metadata experiments showed that: Adding subreddit led to severe overfitting (accuracy, precision, F1 = 1.0). Adding author improved model generalization. XGBoost performed best as a classifier on top of CLIP embeddings.

📊 Results Summary Model / Configuration Description Accuracy BERT + ResNet18 Fusion Early fusion of text & image embeddings 0.89 DeepFusionNet (ResNet + Text Encoder) Custom MLP fusion 0.91 CLIP (ViT-B/32) Vision-language embeddings 0.939 CLIP + Author + XGBoost Final configuration Best: 93.9%

⚙️ Technologies Used Python, Google Colab PyTorch, Transformers (Hugging Face) OpenAI CLIP ResNet, BERT, RoBERTa XGBoost Pandas, NumPy, Matplotlib, tqdm

🚀 How to Run Clone this repository: git clone https://github.com/VishalDhariwal/Fakeddit-Multimodal-FakeNews.git cd fakeddit_models Install dependencies: pip install -r requirements.txt Open the notebooks in Google Colab or Jupyter Notebook and run them sequentially.

🧩 Future Work Experiment with cross-attention fusion for deeper image–text alignment Try fine-tuned CLIP models on Fakeddit Try VIT-L14 for training Add explainability visualizations (e.g., Grad-CAM for image saliency) Deploy the best-performing model via Gradio or Streamlit

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
notebooks		notebooks
results		results
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages