This project performs natural language processing (NLP) tasks and classification on a dataset of news articles to distinguish between Fake News and Factual News. It involves preprocessing, tokenization, Named Entity Recognition (NER), sentiment analysis, topic modeling, and machine learning classification.
fake_news_data.csv: Contains 198 news articles labeled as either "Fake News" or "Factual News".
title: Headline of the news article.text: Body of the news article.date: Publication date.fake_or_factual: Label ("Fake News" or "Factual News").
pandas,matplotlib,seabornspacy,nltk,revaderSentimentgensimsklearn
- Lowercasing, punctuation removal, stopword filtering
- Tokenization using
nltk - Lemmatization using
WordNetLemmatizer - Named Entity Recognition with
spaCy - Sentiment scoring with
VADER - Bag of Words and TF-IDF features
- Distribution of fake vs factual news
- Part-of-speech tagging frequency
- Common named entities in each category
- Sentiment analysis across news types
- Top unigrams after preprocessing
- LDA (Latent Dirichlet Allocation)
- LSA (Latent Semantic Analysis)
- Visualization of coherence scores for optimal topic number
Two models were trained using Bag of Words features:
- Accuracy: 90%
- Precision/Recall:
- Fake News: 93% / 86%
- Factual News: 88% / 94%
- Accuracy: 83%
- Precision/Recall:
- Fake News: 91% / 72%
- Factual News: 78% / 94%
- Count plots
- POS and NER distribution bars
- Sentiment bar charts
- LDA/LSA topic charts
- Clone the repository.
- Make sure
fake_news_data.csvis in the root directory. - Install the dependencies:
pip install -r requirements.txt- Run the analysis in a Jupyter Notebook or Python script.
Vishnu M
LinkedIn: linkedin.com/in/vishnu-m737