Skip to content

Suborno-Deb-Bappon/Crashlytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Crashlytics – US Traffic Accident Severity Analysis & Prediction

Crashlytics is a data science project that analyzes and models US traffic accident data to explore accident patterns and predict accident severity based on environmental, temporal, and infrastructural factors.
The notebook walks through data exploration, feature engineering, preprocessing, model training, and evaluation, producing insights that can help guide safety improvements.


📌 Project Overview

Traffic accidents have major human and economic costs.
This project uses the US Accidents Dataset to:

  • Analyze accident patterns across locations, times, and weather conditions.
  • Identify environmental and road features most related to severe accidents.
  • Build machine learning models to predict accident severity.

The workflow includes EDA, data cleaning, feature selection, and model evaluation.


🛠️ Tech Stack

  • Language: Python
  • Data Processing: Pandas, NumPy
  • Data Visualization: Matplotlib, Seaborn
  • Machine Learning: Scikit-learn
  • Tools: Jupyter Notebook
  • Dataset: US Accidents Dataset (2016–2020)

📂 Repository Structure

📦 Crashlytics
┣ 📁 data/ – Dataset files (download separately)
┣ 📓 crashlytics_notebook.ipynb – Main Jupyter notebook (EDA + ML pipeline)
┣ 📄 requirements.txt – Python dependencies
┗ 📄 README.md – Project documentation


🔎 Key Features

1️⃣ Exploratory Data Analysis (EDA)

  • Accident counts by state (heatmap & bar chart).
  • Text analysis: most frequent words in severity 4 accident descriptions.
  • Most common road features present during accidents.
  • Relationship between accident distance and severity.
  • Accident counts by weather condition and weekday.
  • Temporal trends highlighting rush-hour peaks and weekday/weekend differences.

2️⃣ Data Preprocessing

  • Temporal feature extraction from Start_Time (year, month, day, weekday, hour, minute).
  • Correlation analysis to detect and drop redundant features (e.g., End_Lat, End_Lng, Wind_Chill).
  • Removal of irrelevant identifiers and redundant time/location variables.
  • Duplicate removal and handling of erroneous/missing values.
  • Encoding of categorical variables.

3️⃣ Feature Selection

  • Dropped constant-value columns and variables with little predictive power.
  • Retained only impactful features identified via EDA and correlation matrix.

4️⃣ Machine Learning Models

  • Split dataset into train, validation, and test sets.
  • Trained and evaluated multiple classifiers:
    • Logistic Regression
    • Decision Tree Classifier
    • Random Forest Classifier
  • Evaluated models with:
    • Accuracy
    • Precision, Recall, and F1-score
    • Confusion matrices

5️⃣ Insights

  • California, Texas, and Florida are the most accident-prone states.
  • Severe accidents often occur during poor visibility and adverse weather.
  • Accidents are more frequent during weekday rush hours.
  • Junctions, crossings, and nearby stations are common in severe accident locations.

🚀 Getting Started

1️⃣ Clone the repository

git clone https://github.com/your-username/Crashlytics.git
cd Crashlytics

2️⃣ Install dependencies

pip install -r requirements.txt

3️⃣ Open the notebook

jupyter notebook crashlytics_notebook.ipynb

📈 Results & Impact

  • Highest accuracy achieved with Random Forest Classifier among tested models (Logistic Regression, Decision Tree, Random Forest).
  • Identified key spatio-temporal and environmental factors influencing accident severity, enabling data-informed safety strategies.
  • Found that adverse weather, poor visibility, and peak traffic hours correlate strongly with severe accidents.
  • Analysis highlights California, Texas, and Florida as priority states for targeted road safety measures.
  • Produced a modular, reusable ML pipeline for future scaling and integration with traffic monitoring systems.

🤝 Contributing

Contributions are welcome!

  • Fork the repository
  • Create a new branch for your feature or bug fix
  • Submit a pull request for review

About

Data-driven traffic accident analysis & prediction with actionable safety insights.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors