NOVA IMS — MSc in Data Science and Advanced Analytics (2025)
Course: Business Cases with Data Science · Instructor: Prof. Nuno António
Hotel H (Lisbon) seeks to redesign its customer segmentation strategy.
Using customer and booking data, we apply unsupervised learning to identify homogeneous groups that support targeted marketing and product definition.
- Business Understanding: the existing segmentation (by booking origin) was too simplistic.
- Data Understanding: 111 733 records, 29 features → reduced to ≈107 842 valid clients.
- Data Preparation: duplicate handling, incoherences fix, creation of new features (Has_Preferences, BookingPeriodicity, PercOtherRevenue, CheckInRate, etc.), and merge with external language and income data.
- Feature Selection: PCA → retain ≥95 % variance · Spearman correlation (>0.8 threshold) · remove low-variance features.
- Modeling: K-Means tested for k = 4 … 7 ; evaluated via Elbow method, R² (explained variance) and Silhouette Score (0.152 for 6 clusters, R² = 0.43).
- Chosen Solution: 6 clusters (plus potential segment) offering best business interpretability and separation.
| Cluster | Description |
|---|---|
| 0 | Loyal High-Spenders / High-End Corporate |
| 1 | Portuguese-Speaking Business Travelers |
| 2 | Big Elderly Groups |
| 3 | Family & Interactive Customers |
| 4 | Last-Minute Budget Travelers |
| 5 | One-Time Customers |
| 5 + 1 | Potential Customers (not yet checked in) |
Tailored marketing actions per segment: personalized loyalty programs, Portuguese-language promotions, family bundles, budget packages, and first-stay discounts. Deployment plan includes CRM automation and 6-month retraining schedule.