- Hypothetical Scenario
- Question
- Analysis
- Results and interpretation
- Next steps
- Limitations
- Warning
Governmental authorities desires to reduce crime rates in their states, however, they need to identify these states in three groups to know how to approach to each them.
How can we classify the US states based on the crime rates (assault, rape, and murder) and the proportion of urban population?
USArrests dataset found in R
Used a k-means clustering to identify the three relevant clusters. I proposed only three clusters because that's what fits the stakeholder's need.
To improve the interpretation, I ran an ANOVA and Pairwise comparison testing the difference among the urban population among the clusters
Cluster | Murder | Assault | Rape | UrbanPop |
---|---|---|---|---|
Low crime rates | 3.600 | 78.538 | 12.176 | 52.076 |
High crime rates | 12.165 | 255.250 | 29.165 | 68.400 |
Medium crime rates | 5.841 | 141.882 | 18.823 | 72.470 |
The low crime rates cluster was the only one with a lower mean percentage of urban population, the possible hypotheses might be:
- There is an underreporting about the crime cases in rural states, which might explain why they got the lowest crime rates.
- Urbanization is associated with higher crime rates due to the challenges that its life style carries.
- Assess whether the reporting process in the low crime rates cluster generates understimated crime rates, and fix it if it does.
- Run a further analysis to identify why the high crime rates cluster has higher crime rates, this will help to identify the causes of the problem and invest effectively the public resources in programs that will solve the problematic.
- Survival bias: The analysis didn't analyse trends, only one snapshot at the time.
The scenario proposed is hypothetical, designed to develop analytical skills applied in a real scenario.