Skip to content

Predicting violent crime rates using high-dimensional community data from the UCI dataset. Implements a structured machine learning pipeline with extensive preprocessing, multiple feature selection methods (LASSO, Ridge, Elastic Net, SFS, Best Subsets), and model evaluation via MSE over repeated train-test splits.

Notifications You must be signed in to change notification settings

dimitris-markopoulos/crime-predictor-analysis

Repository files navigation

Crime Rate Prediction from Community Features

Project Overview

This project aims to predict the violent crime rate in different communities based on various socioeconomic, demographic, and law enforcement-related features. By leveraging advanced regression techniques and machine learning models, this project seeks to uncover patterns that influence crime rates and develop predictive models to assist policymakers, law enforcement agencies, and researchers in understanding crime trends.

The project utilizes the Communities and Crime dataset, available from the UCI Machine Learning Repository: 🔗 Communities and Crime Dataset

This dataset includes a wide range of features related to community characteristics, such as:

⊳ Socioeconomic indicators (e.g., income levels, unemployment rates)

⊳ Demographic data (e.g., population density, racial composition)

⊳ Law enforcement statistics (e.g., police presence, per capita law enforcement spending)

Project Goals

⊳ Data Preprocessing & Feature Engineering: Handling missing values, scaling numerical features, and selecting relevant predictors.

⊳ Exploratory Data Analysis (EDA): Understanding correlations between community features and crime rates.

⊳ Regression Models: Applying linear regression, ridge regression, LASSO, and elastic net to establish baseline predictive performance.

⊳ Machine Learning Implementation: Experimenting with random forests, gradient boosting, and deep learning to improve prediction accuracy. (working progress)

⊳ Model Evaluation: Comparing models based on metrics like R², RMSE, and MAE to determine the best approach.

Future Work

⊳ Incorporating geospatial analysis to visualize crime distribution.

⊳ Exploring deep learning architectures for improved predictions.

About

Predicting violent crime rates using high-dimensional community data from the UCI dataset. Implements a structured machine learning pipeline with extensive preprocessing, multiple feature selection methods (LASSO, Ridge, Elastic Net, SFS, Best Subsets), and model evaluation via MSE over repeated train-test splits.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published