📰 Fake News Detection & NLP Analysis

This project performs natural language processing (NLP) tasks and classification on a dataset of news articles to distinguish between Fake News and Factual News. It involves preprocessing, tokenization, Named Entity Recognition (NER), sentiment analysis, topic modeling, and machine learning classification.

📁 Dataset

fake_news_data.csv: Contains 198 news articles labeled as either "Fake News" or "Factual News".

Dataset Columns:

title: Headline of the news article.
text: Body of the news article.
date: Publication date.
fake_or_factual: Label ("Fake News" or "Factual News").

🔧 Libraries Used

pandas, matplotlib, seaborn
spacy, nltk, re
vaderSentiment
gensim
sklearn

🧹 Preprocessing & Feature Engineering

Lowercasing, punctuation removal, stopword filtering
Tokenization using nltk
Lemmatization using WordNetLemmatizer
Named Entity Recognition with spaCy
Sentiment scoring with VADER
Bag of Words and TF-IDF features

📊 Exploratory Data Analysis

Distribution of fake vs factual news
Part-of-speech tagging frequency
Common named entities in each category
Sentiment analysis across news types
Top unigrams after preprocessing

🧠 Topic Modeling

LDA (Latent Dirichlet Allocation)
LSA (Latent Semantic Analysis)
Visualization of coherence scores for optimal topic number

🤖 Machine Learning Models

Two models were trained using Bag of Words features:

Logistic Regression

Accuracy: 90%
Precision/Recall:
- Fake News: 93% / 86%
- Factual News: 88% / 94%

SGDClassifier (Linear SVM)

Accuracy: 83%
Precision/Recall:
- Fake News: 91% / 72%
- Factual News: 78% / 94%

📈 Visualizations

Count plots
POS and NER distribution bars
Sentiment bar charts
LDA/LSA topic charts

🚀 How to Run

Clone the repository.
Make sure fake_news_data.csv is in the root directory.
Install the dependencies:

pip install -r requirements.txt

Run the analysis in a Jupyter Notebook or Python script.

🧾 Author

Vishnu M
LinkedIn: linkedin.com/in/vishnu-m737

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

📰 Fake News Detection & NLP Analysis

📁 Dataset

Dataset Columns:

🔧 Libraries Used

🧹 Preprocessing & Feature Engineering

📊 Exploratory Data Analysis

🧠 Topic Modeling

🤖 Machine Learning Models

Logistic Regression

SGDClassifier (Linear SVM)

📈 Visualizations

🚀 How to Run

🧾 Author

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

📰 Fake News Detection & NLP Analysis

📁 Dataset

Dataset Columns:

🔧 Libraries Used

🧹 Preprocessing & Feature Engineering

📊 Exploratory Data Analysis

🧠 Topic Modeling

🤖 Machine Learning Models

Logistic Regression

SGDClassifier (Linear SVM)

📈 Visualizations

🚀 How to Run

🧾 Author