| Name | GitHub Handle | Contribution |
|---|---|---|
| Shazi Bidarian | @shazibid | Data exploration, visualization, project coordination, state 1 + 2 unsupervised modeling |
| Connie Yang | @connieyyy | Data exploration, project coordination, state 0 unsupervised modeling, documentation |
| Kelly Pham | @kllyph | Data preprocessing, supervised learning with random forest |
| Jaewon Kim | @CanDoJaewon | Data exploration, label matching |
- Developed a machine learning model using
K-Means, DBSCAN,andHDBSCANto classify turns. - Achieved
96%accuracy for random forest model, demonstratingconsistency of replication of modeling resultsforArity.
-
Install the following prerequisites:
a.Python 3.8+
b.Git -
Clone the repo
-
Set up virtual environment
-
Install dependencies
-
Verify installation
-
Open in VS Code and run notebooks
-
Open the workspace in VS Code
-
Select the Python kernel (.venv environment)
-
Start with raw.ipynb to understand the data and explore state0, state1/, state2/ for analysis by driving state
- Data/
- Raw/ → untouched iOS & Android data
- Processed/ → cleaned + split datasets
- Notebooks/ → EDA + experiments
- SRC/ → finalized scripts (data cleaning, modeling)
- Results/ → plots, metrics, reports
- README.md → project overview + instructions
Describe:
- Arity collects user driving data with consent and safe drivers get a lower rate on their insurance policies
- Use AI/ML to classify data points into different types of turns
- Use telematics data to classify vehicle turning behaviors
- Cluster models to distinguish different types of turns
- Create a supervised model to classify vehicle turns
- 20 MB dataset, dictionary structure and CSV format
- Plotted data points with matplotlib and seaborn
- Removed outliers with interquartile method
- Model(s) used: K-Means, HDBSCAN, DBSCAN, and Random Forest
- 96% accuracy score with random forest modeling
- We want to focus on optimizing the supervised model to ensure it generalizes well and is ready for deployment. This includes applying techniques such as grid search or randomized search to systematically explore hyperparameter combinations, using cross‑validation, and tuning parameters like tree depth, minimum samples per leaf, and the number of estimators to balance accuracy with robustness.
- We can also compare Random Forest with other ensemble methods such as Gradient Boosting, and analyze feature importance to understand which inputs drive cluster predictions most strongly.
This project is licensed under the MIT License.
Thank you to our advisor, Francesco De Bernardis, and coach, Matt Brems who supported our project.