Goal: Apply everything from Modules 1–4 to the four use-case families that dominate business data science in practice — and learn the one pattern they share: model → money → decision.
Estimated time: 10–12 hours of focused study. Prerequisites: Modules 1–4 (especially NB 14–16 — models, honest evaluation, pipelines; NB 11 returns for the forecasting half).
🧭 Where this fits. Modules 1–4 taught the tools; this module is the job. Every notebook starts from a business question, ends in a costed decision rule, and carries its own classic failure mode (post-outcome leakage, the accuracy trap, unscaled clustering, unshifted rolling features). It's also deliberately interview-shaped: these four applications are what "tell me about a project" questions are made of.
NB 35 Churn, CLV & retention → who is leaving, what are they worth,
who gets the offer?
NB 36 Fraud & anomaly detection → 0.5 % positives, asymmetric costs,
an alert queue sized by analyst capacity
NB 37 Segmentation & recommenders → which customers are similar,
what should each see next? (unsupervised!)
NB 38 Forecasting & maintenance → how much to stock, when to service —
forecasts and sensors become schedules
| # | Notebook | What you'll build |
|---|---|---|
| 35 | 35_churn_clv_retention.ipynb |
CLV table, expected-value targeting rule, budget-constrained retention campaign simulation |
| 36 | 36_fraud_anomaly_detection.ipynb |
Supervised + Isolation Forest detectors, precision@k alert queue, €-prevented accounting |
| 37 | 37_segmentation_recommenders.ipynb |
RFM + k-means segmentation with profiles, item-item recommender that beats the popularity baseline |
| 38 | 38_demand_maintenance.ipynb |
Promo-aware demand forecasts → safety stock & newsvendor orders; sensor-based maintenance schedule |
- A probability is not a decision. Every application converts scores into actions via costs: break-even thresholds (35), queue capacity (36), campaign budgets (35/36), service levels and critical ratios (38).
- Respect time or your metrics lie. Temporal splits, shifted rolling features, frozen bin edges — the same discipline in three costumes (35, 36, 38).
- Beat the dumb baseline first. Always-honest (36), popularity (37), seasonal-naive (38) — if the model can't beat them, ship the baseline.
- Unsupervised needs naming. Clusters and anomalies only become business objects when you profile them, name them, and check they're stable (36, 37).
→ Module 5 — AI Engineering (../05_ai_engineering/17_ai_workflows.ipynb) if you haven't done it yet — LLMs layered on top of exactly these workflows.
→ Capstone A (../07_capstones/24_capstone_analytics.ipynb) to prove the analytics half end-to-end.