This repository contains a data analytics project analyzing regional economic inequality in the United Kingdom. It implements an end-to-end ETL pipeline to process 26 years of geospatial and economic data from the Office for National Statistics (ONS).
The project then uses advanced geospatial statistics, time-series analysis, and interactive visualization to identify economic "hot spots," "cold spots," and long-term growth trends.
The goal of this project is to explore the complex relationship between economic productivity (measured by Gross Value Added, GVA) and social-economic factors (like Population and the Index of Multiple Deprivation, IMD).
It moves beyond a simple static analysis by building a rich, time-series dataset spanning from 1998 to 2023. This allows us to ask deeper questions:
- Where are the statistically significant clusters of wealth and deprivation?
- Have these "hot spots" and "cold spots" changed over the last 26 years?
- What are the underlying growth trajectories of different regions?
- Which features (deprivation, location) are the best predictors of economic output?
This project showcases a full-stack data science workflow, from data engineering to visualization.
The repository is organized into a modular pipeline, separating data engineering from analysis and the final application.
UK-Regional-Insights/
├─ data_pipeline/
│ └─ transformers.py # Python script to clean and merge all raw data
├─ notebooks/
│ └─ data_exploration.ipynb # Jupyter notebook with all analyses
├─ app/
│ └─ streamlit_dashboard.py # (Future) Code for the interactive web app
├─ models/
│ ├─ inequality_predictor.py # (Future) Baseline ML models
│ └─ gnn_regional_networks.py # (Future) GNN model
├─ assets/
│ ├─ animated-map-demo.gif # Demo GIFs for this README
│ └─ lisa-hotspot-map.png # Saved plots
├─ data/
│ ├─ input/ # (Ignored by git) Raw .xlsx and .geojson files
│ └─ processed/ # (Ignored by git) The final master GeoPackage
├─ outputs/ # (Ignored by git) Saved .html and .png visualizations
├─ .gitignore # Ignores all data, output, and cache files
├─ environment.yml # Reproducible Conda environment
└─ README.md # You are here!
This notebook performs five key analyses to move from raw data to actionable insights.
First, a 2D choropleth map (using folium) visualizes the GVA per capita for the most recent year (2023), clearly showing the static economic landscape. This is complemented by an animated time-series GIF (created from a folium.plugins.TimestampedGeoJson map) that shows the dramatic economic changes from 1998 to 2023.
This analysis uses Local Moran's I (LISA) from the PySAL library to find statistically significant spatial clusters. It clearly identifies the "High-High" (Hot Spot) cluster around London and "Low-Low" (Cold Spot) clusters in former industrial areas and rural regions.
This analysis moves beyond a static snapshot to identify long-term economic trajectories. It calculates the Compound Annual Growth Rate (CAGR) for GVA per capita for every Local Authority from 1998 to 2023. This metric reveals the "winners" (fastest-growing regions) and "losers" (stagnating or declining regions) over the past quarter-century.
These plotly charts visualize the complex, multi-dimensional relationship between GVA per Capita, Deprivation (IMD Rank), and Population. The 3D plot allows for a full exploration of how these three key variables interact, while linking the point's shape to the LISA clusters (Hot Spot/Cold Spot) connects this analysis back to the spatial data.
This project uses Conda to manage its environment and dependencies. You'll need to have Anaconda or Miniconda installed.
-
Clone the repository:
git clone [https://github.com/Tahernezhad/UK-Regional-Insights.git](https://github.com/Tahernezhad/UK-Regional-Insights.git) cd UK-Regional-Insights -
Create the Conda environment: Use the provided
environment.ymlfile to create the Conda environment. This will install all the necessary packages.conda env create -f environment.yml
-
Activate the environment:
conda activate geoml
-
Download the Data: The raw data files are not included in this repository. Please download them from the official sources and place them in a
data/input/folder (you will need to create this folder).- GVA & Population: ONS Regional Accounts
- IMD: English Indices of Deprivation 2019
- Geometries: ONS Open Geography Portal
-
Run the ETL Pipeline: Execute the
transformers.pyscript to process all raw files into a single master GeoPackage.python data_pipeline/transformers.py
-
Run the Analysis Notebook: Launch Jupyter and open the main notebook to see all the analyses and visualizations.
jupyter notebook notebooks/data_exploration.ipynb
This project provides a robust foundation for predictive modeling. The next steps are:
- Streamlit Dashboard: Populate the
app/streamlit_dashboard.pyfile to create a fully interactive web application. - Baseline ML Model: Build a baseline
RandomForestorXGBoostmodel to predictGVA_per_capitausing the features from the exploration notebook. - Graph Neural Network (GNN): Implement a GNN (using the
models/directory) to model the spatial network explicitly. The spatial weights matrix from the LISA analysis will serve as the graph's adjacency matrix, allowing the model to learn from neighboring regions. """



