https://www.kaggle.com/c/sf-crime
- Category -Category of the crime incident. This is the target variable for prediction
- Dates -Timestamp of the crime incident
- DayOfWeek -The day of the week
- PdDistrict -Name of the Police Department District
- Address -The approximate street address of the crime incident
- X -Longitude
- Y -Latitude
- 'X' and 'Y' coordinates were trimmed to four digits of significance and paired to allow larger ‘plots’ of land to be accurately related
- Holidays were identified using the Pandas built-in calendar for US holidays
- Dates feature was broken into the following: -- Hour of the Day -- Day of the week (dummy variable) -- Week of the year
Matplotlib to visualize parameter optimization and identify knees
Our baseline model for this analysis was a K-NN model with K=50.
Optimizations and methods:
- Keras (on top of TensorFlow) for NNs
- Sigmoid activation and ReLU based models
- Modeled with and without dropouts
Final Parameters:
Optimizations and methods:
- Boosting
- Bagging
- GridSearchCV for parameter optimization