BBC News Most Read Logger

This project automatically scrapes the "Most Read" stories from the BBC News homepage (https://www.bbc.co.uk/news) every hour and logs them.

How it Works

A Python script (BBC_News_Most_Read_Scraper.py) uses requests and BeautifulSoup4 to fetch and parse the news homepage. The top 10 most read stories (title and link) are extracted.

A GitHub Actions workflow (.github/workflows/scrape_bbc.yml) runs this script every hour.

Data Storage

The scraped data is stored in CSV files within the data/ directory.

A new file is created each day, named bbc_most_read_YYYY-MM-DD.csv.
Each file contains entries for all scrapes performed on that UTC date.
Columns: timestamp (UTC time of scrape), rank (1-10), title, link.

Setup

Clone the repository.
Ensure Python 3.x is installed.
Install dependencies with uv:

uv pip install -e .

Running Manually (Optional)

You can run the scraper manually:

python BBC_News_Most_Read_Scraper.py

This will append data to the current day's CSV file in the data/ directory.

To fetch the full text of articles for yesterday's URLs:

python article_content_scraper.py

Automation

The process is automated via GitHub Actions, running hourly and committing updated data files back to the repository.

Daily Article Content Fetch

Each day at 02:00 UTC, the union of all URLs seen in the "Most Read" and front page promo logs from the previous day is fetched. The full article HTML and a plain-text version are written to data/article-content/{YYYY-MM-DD}.parquet.

Name		Name	Last commit message	Last commit date
Latest commit History 4,067 Commits
.github/workflows		.github/workflows
data		data
tests		tests
BBC_News_Most_Read_Scraper.py		BBC_News_Most_Read_Scraper.py
README.md		README.md
__init__.py		__init__.py
article_content_scraper.py		article_content_scraper.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BBC News Most Read Logger

How it Works

Data Storage

Setup

Running Manually (Optional)

Automation

Daily Article Content Fetch

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

alastairherd/bbc_news_logger

Folders and files

Latest commit

History

Repository files navigation

BBC News Most Read Logger

How it Works

Data Storage

Setup

Running Manually (Optional)

Automation

Daily Article Content Fetch

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages