Crashlytics – US Traffic Accident Severity Analysis & Prediction

Crashlytics is a data science project that analyzes and models US traffic accident data to explore accident patterns and predict accident severity based on environmental, temporal, and infrastructural factors.
The notebook walks through data exploration, feature engineering, preprocessing, model training, and evaluation, producing insights that can help guide safety improvements.

📌 Project Overview

Traffic accidents have major human and economic costs.
This project uses the US Accidents Dataset to:

Analyze accident patterns across locations, times, and weather conditions.
Identify environmental and road features most related to severe accidents.
Build machine learning models to predict accident severity.

The workflow includes EDA, data cleaning, feature selection, and model evaluation.

🛠️ Tech Stack

Language: Python
Data Processing: Pandas, NumPy
Data Visualization: Matplotlib, Seaborn
Machine Learning: Scikit-learn
Tools: Jupyter Notebook
Dataset: US Accidents Dataset (2016–2020)

📂 Repository Structure

📦 Crashlytics
┣ 📁 data/ – Dataset files (download separately)
┣ 📓 crashlytics_notebook.ipynb – Main Jupyter notebook (EDA + ML pipeline)
┣ 📄 requirements.txt – Python dependencies
┗ 📄 README.md – Project documentation

🔎 Key Features

1️⃣ Exploratory Data Analysis (EDA)

Accident counts by state (heatmap & bar chart).
Text analysis: most frequent words in severity 4 accident descriptions.
Most common road features present during accidents.
Relationship between accident distance and severity.
Accident counts by weather condition and weekday.
Temporal trends highlighting rush-hour peaks and weekday/weekend differences.

2️⃣ Data Preprocessing

Temporal feature extraction from Start_Time (year, month, day, weekday, hour, minute).
Correlation analysis to detect and drop redundant features (e.g., End_Lat, End_Lng, Wind_Chill).
Removal of irrelevant identifiers and redundant time/location variables.
Duplicate removal and handling of erroneous/missing values.
Encoding of categorical variables.

3️⃣ Feature Selection

Dropped constant-value columns and variables with little predictive power.
Retained only impactful features identified via EDA and correlation matrix.

4️⃣ Machine Learning Models

Split dataset into train, validation, and test sets.
Trained and evaluated multiple classifiers:
- Logistic Regression
- Decision Tree Classifier
- Random Forest Classifier
Evaluated models with:
- Accuracy
- Precision, Recall, and F1-score
- Confusion matrices

5️⃣ Insights

California, Texas, and Florida are the most accident-prone states.
Severe accidents often occur during poor visibility and adverse weather.
Accidents are more frequent during weekday rush hours.
Junctions, crossings, and nearby stations are common in severe accident locations.

🚀 Getting Started

1️⃣ Clone the repository

git clone https://github.com/your-username/Crashlytics.git
cd Crashlytics

2️⃣ Install dependencies

pip install -r requirements.txt

3️⃣ Open the notebook

jupyter notebook crashlytics_notebook.ipynb

📈 Results & Impact

Highest accuracy achieved with Random Forest Classifier among tested models (Logistic Regression, Decision Tree, Random Forest).
Identified key spatio-temporal and environmental factors influencing accident severity, enabling data-informed safety strategies.
Found that adverse weather, poor visibility, and peak traffic hours correlate strongly with severe accidents.
Analysis highlights California, Texas, and Florida as priority states for targeted road safety measures.
Produced a modular, reusable ML pipeline for future scaling and integration with traffic monitoring systems.

🤝 Contributing

Contributions are welcome!

Fork the repository
Create a new branch for your feature or bug fix
Submit a pull request for review

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crashlytics – US Traffic Accident Severity Analysis & Prediction

📌 Project Overview

🛠️ Tech Stack

📂 Repository Structure

🔎 Key Features

1️⃣ Exploratory Data Analysis (EDA)

2️⃣ Data Preprocessing

3️⃣ Feature Selection

4️⃣ Machine Learning Models

5️⃣ Insights

🚀 Getting Started

1️⃣ Clone the repository

2️⃣ Install dependencies

3️⃣ Open the notebook

📈 Results & Impact

🤝 Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
crashlytics_notebook.ipynb		crashlytics_notebook.ipynb
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Crashlytics – US Traffic Accident Severity Analysis & Prediction

📌 Project Overview

🛠️ Tech Stack

📂 Repository Structure

🔎 Key Features

1️⃣ Exploratory Data Analysis (EDA)

2️⃣ Data Preprocessing

3️⃣ Feature Selection

4️⃣ Machine Learning Models

5️⃣ Insights

🚀 Getting Started

1️⃣ Clone the repository

2️⃣ Install dependencies

3️⃣ Open the notebook

📈 Results & Impact

🤝 Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages