Second Machine Learning project developed for the Machine Learning course at NOVA IMS (2022/2023).
Identify customer segments for Ready, Steady Ride (ride-sharing) using unsupervised learning.
The goal is to uncover weather-driven and behavioral patterns to guide marketing and resource allocation.
- ~9,600 observations and 17 features
- Weather (Temperature, Humidity, WindSpeed, WeatherForecast), behavior (Registered/Non-registered users), and time (Hour, WorkingDay, Holiday, DayofWeek, Month)
- Unsupervised task (no target)
- Data exploration & visualization; value harmonization and outlier handling
- Feature engineering & selection: totals per month/day, Spearman correlation (removed highly correlated/uninformative features)
- Scaling: Robust Scaler (more stable under outliers)
- Clustering: K-Means and K-Prototypes (self-study)
- Model selection: elbow (inertia), hierarchical clustering (Ward dendrogram) and silhouette
- Weather conditions – Temperature, Humidity, WindSpeed, etc.
- Customer behavior – Registered / Non-registered, totals per month/day.
- Temporal patterns – Hour of day, Holiday, WorkingDay, DayofWeek.
- Final solution: K-Means with k = 4 clusters (global run after combining perspectives).
- Cluster profiles (examples):
- Cold & humid, low engagement (non-registered low; totals month/day modest)
- Moderate weather, high registered activity
- Warm weather, high engagement (registered + non-registered)
- Warm but more humid, moderate engagement
- Loyalty program to convert heavy non-registered users.
- Weather-based promotions in windows of high propensity.
- Resource allocation tuned by hour and cluster demand profile.
pip install -r requirements.txt
jupyter lab
# open notebooks/ML1_Group18_Clustering_Notebook.ipynb