Personalised News Digest

A machine learning-based tool for providing personalized news digests to users from web-scraped data. The project uses a Random Forest model with rule-based keyword filtering to classify news articles and provide personalized summaries based on user preferences.

Features

Web Scraping: Automated collection of news articles from Google News RSS feeds
Advanced Classification: Machine learning model achieving 99.97% accuracy across 41 news categories
Rule-Based Filtering: Multi-category keyword-based classification for improved accuracy
Multi-Label Support: Articles can be assigned multiple categories (e.g., TECH,SCIENCE)
Personalized Selection: User-driven category selection for customized news digests
Real-Time Processing: Live article classification and categorization

Project Structure

Personalise_News_Digest_Project/
├── webscrapper.py              # Web scraping and classification pipeline
├── bert_model.py               # Machine learning model training and evaluation
├── personalised_digest.py      # User interaction and personalized digest generation
├── text_preprocessing.py       # Text preprocessing utilities
├── requirements_bert           # Python dependencies
├── model_development_summary.txt # Project progress and achievements
└── README.md                   # This file

Setup

Clone the repository:

git clone https://github.com/Amaan247788/personalised-news-digest.git
cd personalised-news-digest

Create and activate a virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements_bert

Usage

Web Scraping and Classification

To scrape news articles and classify them:

python webscrapper.py

This will:

Scrape articles from Google News RSS feeds
Preprocess and classify articles using rule-based filtering and ML model
Save results to timestamped CSV files with predicted categories

Personalized News Digest

To generate a personalized news digest:

python personalised_digest.py

This will:

Load the latest classified news data
Present available categories to the user
Allow user to select categories of interest
Generate personalized news summaries (coming soon)

Model Performance

The current model achieves:

99.97% average accuracy across all 41 categories
100% accuracy for 40 out of 41 categories
Rule-based filtering for TECH, SPORTS, POLITICS, BUSINESS, ENTERTAINMENT, SCIENCE
Multi-label classification support for articles matching multiple categories

Technical Stack

Python 3.x
scikit-learn: Machine learning implementation
NLTK: Natural language processing
pandas: Data manipulation
numpy: Numerical operations
imbalanced-learn: Handling class imbalance

Recent Enhancements

Integrated web scraping pipeline using Google News RSS feeds
Developed rule-based keyword filtering system for improved accuracy
Enabled multi-label category assignment
Enhanced model robustness with hybrid rule-based + ML approach
Automated CSV output with predicted categories
Maintained robust version control and collaborative workflow

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitignore		.gitignore
README.md		README.md
analyze_dataset.py		analyze_dataset.py
bert_model.py		bert_model.py
category_distribution.png		category_distribution.png
download_nltk_data.py		download_nltk_data.py
improved_model.py		improved_model.py
ml_steps.txt		ml_steps.txt
model_development_summary.txt		model_development_summary.txt
requirements.txt		requirements.txt
requirements_bert		requirements_bert
requirements_bert.txt		requirements_bert.txt
tech_news.csv		tech_news.csv
test_preprocessing.py		test_preprocessing.py
text_preprocessing.py		text_preprocessing.py
thought.txt		thought.txt
trainingModel.py		trainingModel.py
webscrapper.py		webscrapper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Personalised News Digest

Features

Project Structure

Setup

Usage

Web Scraping and Classification

Personalized News Digest

Model Performance

Technical Stack

Recent Enhancements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Amaan247788/personalised-news-digest

Folders and files

Latest commit

History

Repository files navigation

Personalised News Digest

Features

Project Structure

Setup

Usage

Web Scraping and Classification

Personalized News Digest

Model Performance

Technical Stack

Recent Enhancements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages