-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathsubscription-retention-model.qmd
1150 lines (921 loc) · 44.7 KB
/
subscription-retention-model.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: Discrete-Time, Contractual Setting Retention Model
author: Abdullah Mahmood
date: last-modified
format:
html:
theme: cosmo
css: quarto-style/style.css
highlight-style: atom-one
mainfont: Palatino
fontcolor: black
monobackgroundcolor: white
monofont: Menlo, Lucida Console, Liberation Mono, DejaVu Sans Mono, Bitstream Vera Sans Mono, Courier New, monospace
fontsize: 13pt
linestretch: 1.4
number-sections: true
number-depth: 5
toc: true
toc-location: right
toc-depth: 5
code-fold: true
code-copy: true
cap-location: bottom
format-links: false
embed-resources: true
anchor-sections: true
code-links:
- text: GitHub Repo
icon: github
href: https://github.com/abdullahau/customer-analytics/
- text: Quarto Markdown
icon: file-code
href: https://github.com/abdullahau/customer-analytics/blob/main/subscription-retention-model.qmd
html-math-method:
method: mathjax
url: https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js
---
## Imports
```{python}
#| code-fold: false
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
from scipy.optimize import minimize
%config InlineBackend.figure_formats = ['svg']
```
## Introduction
The models discussed here are applicable for the following two-dimensional classification of customer base:
1. **Opportunities for transactions: Discrete-Time**
- By “discrete-time,” we mean that transactions can occur only at fixed points in time (e.g., the annual renewal cycles for most professional organizations). This is in contrast to continuous-time, where the transactions can occur at any point in time (e.g., the cancelation of basic utility contracts).
2. **Type of relationship with customers: Contractual**
- In a “contractual” setting, the time at which the customer becomes inactive is observed (e.g., when the customer fails to renew a subscription). This is in contrast to a “noncontractual” setting, where the absence of a contract or subscription means that the point in time at which the customer becomes inactive is not observed by the firm (e.g., a catalog retailer). The challenge is how to differentiate between a customer who has ended a “relationship” with the firm versus one who is merely in the midst of a long hiatus between transactions.
Consider a company with a subscription-based business model. 1000 customers are acquired at the beginning of Year 1 with the following renewal patterns:
| ID | Year 1 | Year 2 | Year 3 | Year 5 | Year 5 |
| ---- | ------ | ------ | ------ | ------ | ------ |
| 1 | 1 | 1 | 0 | 0 | 0 |
| 2 | 1 | 0 | 0 | 0 | 0 |
| 3 | 1 | 1 | 1 | 0 | 0 |
| 4 | 1 | 1 | 0 | 0 | 0 |
| 5 | 1 | 1 | 1 | 1 | 1 |
| . | . | . | . | . | . |
| . | . | . | . | . | . |
| . | . | . | . | . | . |
| 998 | 1 | 0 | 0 | 0 | 0 |
| 999 | 1 | 1 | 1 | 0 | 0 |
| 1000 | 1 | 0 | 0 | 0 | 0 |
| | **1000** | **631** | **468** | **382** | **326** |
**Motivating Problem**
- How many customers will “survive” to Year 6, 7, . . . , 13?
- What will the retention rates for this cohort look like for the next 8 years?
{fig-align="center" width="550"}
The **survivor function** $S(t)$ is the proportion of the cohort that continue as a customer beyond $t$.
$$
\begin{align*}
S(0) &= ? \\
S(1) &= ? \\
S(2) &= ?
\end{align*}
$$
The **retention rate** is the ratio of customers retained to the number at risk.
$$
\begin{align*}
r(1) &= ? \\
r(2) &= ?
\end{align*}
$$
For survivor function $S(t)$
$$
r(t) = \frac{S(t)}{S(t − 1)}
$$
**Modeling Objective**: We want to derive a mathematical expression for $S(t)$, which can then be used to generate the desired forecasts.
```{python}
year, alive = np.loadtxt('data/hardie-sample-retention.csv', dtype='object', delimiter=',', unpack=True, skiprows=1)
year = year.astype(int)
alive = alive.astype(float)
surviving = alive / alive[0]
plt.figure(figsize=(8,4), dpi=100)
plt.plot(surviving[:6], "k-", linewidth=0.75)
plt.axvline(5, color="black", linestyle='--', linewidth=0.75)
plt.xlabel("Tenure (years)")
plt.ylabel("% Surviving")
plt.ylim(0,1.1)
plt.xlim(0, 13)
plt.gca().yaxis.set_major_formatter(PercentFormatter(1))
```
### Parametric Approach: Projecting Survival Using Simple Functions of Time - Linear Regression
The simplest and most intuitive approach to modeling customer retention to try to fit some known function to the retention curve. This approach is called **parametric statistics**, because a few parameters describe the shape of the function. The power of this approach is that we can use it to estimate what happens in the future. Using a single cohort data of active customers from acquisition to the end of an observed period we can compute a best-fit curve that minimizes the sum of squared errors.
A **parametric approach** to modeling retention curves means selecting a specific functional form (or model) with a small number of parameters that can capture the retention dynamics over time. The functional form can be linear, quadratic, or exponential. By tuning the function's parameters, we aim to approximate the behavior of actual customer retention data. In the implementations below, the parameters are optimized to match the predicted retention curve with the observed data.
1. Linear Function Form: $R(t) = \beta_{0} + \beta_{1}t$
2. Quadratic Function Form: $R(t) = \beta_{0} + \beta_{1}t + \beta_{2}t^{2}$
3. Exponential Function Form: $R_{t} = e^{\beta_{0}+\beta_{1}t}$
```{python}
import statsmodels.api as sm
import warnings
warnings.simplefilter('ignore', category=UserWarning)
y = surviving[:5]
# Linear Function Regression
x = sm.add_constant(year[:5])
linear_results = sm.OLS(y, x).fit()
print(linear_results.summary())
```
```{python}
# Quadratic Function Regression
x = np.column_stack((year[:5], year[:5]**2))
x = sm.add_constant(x)
quadratic_results = sm.OLS(y, x).fit()
print(quadratic_results.summary())
```
```{python}
# Exponential Function Regression
x = year[:5]
x = sm.add_constant(x)
exponential_results = sm.OLS(np.log(surviving[:5]), x).fit()
print(exponential_results.summary())
```
```{python}
intercept, t_coeff = linear_results.params
y_lin = intercept + (t_coeff * year)
intercept, t_coeff, tsquare = quadratic_results.params
y_quad = intercept + t_coeff*year + tsquare*year**2
intercept, t_coeff = exponential_results.params
y_exp = np.exp(intercept + t_coeff*year)
train_marker_x = [5 for _ in np.arange(-1,2.1,0.1)]
train_marker_y = [_ for _ in np.arange(-1,2.1,0.1)]
surviving = alive / alive[0]
plt.figure(figsize=(8,4), dpi=100)
plt.title("Survival Curve Projections")
plt.plot(year, surviving, "k-", linewidth=1, label="Actual")
plt.plot(year, y_lin, "g--", linewidth=0.75, label="Linear")
plt.plot(year, y_quad, "r--", linewidth=0.75, label="Quadratic")
plt.plot(year, y_exp, "b--", linewidth=0.75, label="Exponential")
plt.axvline(5, color="black", linestyle='--', linewidth=0.75)
plt.xlabel("Tenure (years)")
plt.ylabel("% Surviving")
plt.ylim(-1,2)
plt.xlim(0, 12)
plt.legend()
plt.gca().yaxis.set_major_formatter(PercentFormatter(1))
```
The fit of all three models up to and including Year 4 is reasonable, and the quadratic model provides a particularly good fit (R-Squared = 0.989). But when we consider the projections beyond the model calibration period, all three models break down dramatically. The linear and exponential models underestimate Year 12 survival while the quadratic model overestimates Year 12 survival. Furthermore, the models lack logical consistency: The linear model would have $S(t) < 0$ after year 6, and according to the quadratic model the survivor function will start to increase over time, which is not possible.
Of course, we could try out different arbitrary functions of time, but this would be a pure curve-fitting exercise at its worst. Furthermore, it is hard to imagine that there would be any underlying rationale for the equation(s) that we might settle upon.
## Shifted Geometric (SG) Model
In the three distinct implementation examples we look at:
- The first uses $x$ as a parameter, which represents a **monthly retention** which is obtained by minimizing the sum of squared errors between actual and predicted churn at each period and the number of survivors at the end of the period.
- Models retention as a deterministic process of expected losses.
- The second uses $\theta$ as a parameter, which represents a **monthly churn probability** which is obtained by minimizing the sum of squared errors between actual and predicted retention at each period.
- Focuses on fitting the retention rate proportionally.
- The third also uses $\theta$ as a parameter, which represents a **monthly churn probability** but is designed to maximize the log-likelihood function which represents the sum of the product of actual and log of the predicted churn and actual and log of the predicted survivors at the end of the period.
- Assumes churn follows a probabilistic distribution, maximizing the likelihood of observed churn and survival.
These three implementations model customer retention using a shifted geometric approach. While each one approaches the calculation of retention and churn differently, they share the common goal of finding a parameter (`theta` or `x`) that best fits the data to a retention curve. Here’s a breakdown of their similarities and differences:
General overview of the three general methods implemented:
1. **Goal**: All three implementations aim to estimate a retention rate that minimizes the discrepancy between the model's prediction and the observed retention data. This is achieved through different error functions or likelihood calculations.
2. **Optimization**: All use `scipy.optimize.minimize` to find the optimal value for `theta` (or `x`), which represents the retention or churn parameter.
3. **Retention Model**: They each use a variation of a geometric retention model, where the probability of customers staying decreases over time at a certain rate.
**Implementation 1**: Sum of Squared Errors Using Expected Losses and Survivors
- **Objective**: This model minimizes the sum of squared errors (SSE) by comparing the *expected customer loss per month and the actual observed loss*. It also considers the *difference between expected and actual survivors at the end of the period*.
- **Expected Retention Calculation**: It calculates the expected retention each month based on `x`, then uses it to find the probability of churn (`p_churn`) and expected monthly losses (`en_loss`).
- **Error Calculation**: Combines the squared error for both customer loss and survivors to get an overall error.
**Implementation 2**: Direct Retention Comparison Using Monthly Retention Proportion
- **Objective**: This model also minimizes SSE but does so by directly comparing the retention rate over time with the model’s expected retention (`(1 - theta) ** month`).
- **Retention Calculation**: it calculate the actual retention rate, then calculates the model’s expected retention with the geometric formula.
- **Error Calculation**: The function minimizes the SSE between the observed retention and the predicted retention directly.
**Implementation 3**: Log-Likelihood Maximization Using Probabilistic Churn
- **Objective**: Instead of minimizing SSE, this function *maximizes the log-likelihood function* for the observed retention data as a means of estimating model parameters. This approach is more appropriate for use in a probabilistic model.
- **Churn Probability Calculation**: It calculates the probability of churn each month with a probabilistic model based on `theta`, which is the monthly churn rate.
- **Likelihood Components**: Calculates log-likelihood in two parts:
- **Churn**: The likelihood of observing the actual number of customers lost each period.
- **Survivors**: The likelihood of the remaining customers staying until the end.
- **Error Calculation**: Returns the negative log-likelihood, which is minimized to find the best-fit `theta`.
These methods all aim to fit a shifted geometric retention curve but through different interpretations and optimization approaches, offering alternative ways to capture customer retention dynamics.
### 1 Customer Segment - Geometric Curve Fitting with Retention Rate Parameter
```{python}
month, alive = np.loadtxt('data/retention-example.csv', dtype='object', delimiter=',', unpack=True, skiprows=1)
month = month.astype(int)
alive = alive.astype(float)
```
```{python}
# Observed retention curve
retention = alive/alive[0]
# Monthly loss
loss = alive[:-1] - alive[1:]
```
```{python}
#| code-fold: false
def square_error(x):
e_retention = np.ones_like(retention) * (x**month) # Expected monthly retention
p_churn = e_retention[:-1] - e_retention[1:] # Probability of monthly churn
en_loss = p_churn * alive[0] # Expected number of losses / month
loss_error = np.sum((en_loss - loss)**2) # Sum of square of error - expected and actual loss
survivor_error = ((e_retention[-1] * alive[0]) - alive[-1])**2 # Sum of square of error - expected and actual survivors at the end of the period
return loss_error + survivor_error
x_guess = 0.94
res = minimize(square_error, x_guess)
print('Predicted Monthly Retention Rate =', f'{res.x[0]:0.2%}')
print('Sum of Squared Errors =', f'{res.fun:0.0f}')
```
```{python}
plt.plot(month, retention, label="Actual")
plt.plot(month, np.ones_like(retention) * (res.x**month), label="Expected")
plt.title('Retention Model Curve - 1 Customer Segment')
plt.xlabel("Tenure (Months)")
plt.ylabel("% Surviving")
plt.ylim((0,1.2))
plt.legend()
plt.gca().yaxis.set_major_formatter(PercentFormatter(1))
```
### 2 Customer Segments - Geometric Curve Fitting with Retention Rate & Segment Proportion Parameters
```{python}
# Observed retention curve
retention = alive/alive[0]
# Monthly loss
loss = alive[:-1] - alive[1:]
```
```{python}
#| code-fold: false
def square_error(x):
e_retention = (np.ones_like(retention) * (x[0]**month) * x[2]) + \
(np.ones_like(retention) * (x[1]**month) * (1 - x[2]))
e_churn = e_retention[:-1] - e_retention[1:]
en_loss = e_churn * alive[0]
loss_error = np.sum((en_loss - loss)**2)
survivor_error = ((e_retention[-1] * alive[0]) - alive[-1])**2
return loss_error + survivor_error
# 3 Decision Variables: Retention rate for segment 1 & 2 and proportion of segment 1 customers
guesses = 0.95, 0.90, 0.20
bnds = ((0, 1), (0, 1), (0, 1))
res = minimize(square_error, x0=guesses, bounds=bnds)
print('Monthly Retention Rate - Segment 1 =', f'{res.x[0]:0.2%}')
print('% of Customer - Segment 1 =', f'{res.x[2]:0.2%}')
print('Monthly Retention Rate - Segment 2 =', f'{res.x[1]:0.2%}')
print('% of Customer - Segment 2 =', f'{1 - res.x[2]:0.2%}')
print('Sum of Squared Errors =', f'{res.fun:0.0f}')
res
```
```{python}
e_retention = (np.ones_like(retention) * (res.x[0]**month) * res.x[2]) + \
(np.ones_like(retention) * (res.x[1]**month) * (1 - res.x[2]))
plt.plot(month, retention, label="Actual")
plt.plot(month, e_retention, label="Expected")
plt.title('Retention Model Curve - 2 Customer Segments')
plt.xlabel("Tenure (Months)")
plt.ylabel("% Surviving")
plt.ylim((0,1.5))
plt.legend()
plt.gca().yaxis.set_major_formatter(PercentFormatter(1))
```
### 3 Customer Segments - Geometric Curve Fitting with Retention Rate & Segment Proportion Parameters
```{python}
# Observed retention curve
retention = alive/alive[0]
# Monthly loss
loss = alive[:-1] - alive[1:]
```
```{python}
#| code-fold: false
from scipy.optimize import LinearConstraint
def square_error(x):
e_retention = (np.ones_like(retention) * (x[0]**month) * x[3]) + \
(np.ones_like(retention) * (x[1]**month) * x[4]) + \
(np.ones_like(retention) * (x[2]**month) * x[5])
e_churn = e_retention[:-1] - e_retention[1:]
en_loss = e_churn * alive[0]
loss_error = np.sum((en_loss - loss)**2)
survivor_error = ((e_retention[-1] * alive[0]) - alive[-1])**2
return loss_error + survivor_error
guesses = np.array([0.90, 0.90, 0.90, 0.20, 0.30, 0.50])
bnds = np.array([(0, 1), (0, 1), (0, 1), (0, 1), (0, 1), (0, 1)])
cons = LinearConstraint([0, 0, 0, 1, 1, 1], 1, 1) # Define the constraint: x[3] + x[4] + x[5] == 1
res = minimize(square_error, x0=guesses, bounds=bnds, constraints=[cons], method='trust-constr')
print('Monthly Retention Rate - Segment 1 =', f'{res.x[0]:0.2%}')
print('% of Customer - Segment 1 =', f'{res.x[3]:0.2%}')
print('Monthly Retention Rate - Segment 2 =', f'{res.x[1]:0.2%}')
print('% of Customer - Segment 2 =', f'{res.x[4]:0.2%}')
print('Monthly Retention Rate - Segment 3 =', f'{res.x[2]:0.2%}')
print('% of Customer - Segment 3 =', f'{res.x[5]:0.2%}')
print('Sum of Squared Errors =', f'{res.fun:0.0f}')
```
```{python}
e_retention = (np.ones_like(retention) * (res.x[0]**month) * res.x[3]) + \
(np.ones_like(retention) * (res.x[1]**month) * res.x[4]) + \
(np.ones_like(retention) * (res.x[2]**month) * res.x[5])
plt.plot(month, retention, label="Actual")
plt.plot(month, e_retention, label="Expected")
plt.title('Retention Model Curve - 3 Customer Segments')
plt.xlabel("Tenure (Months)")
plt.ylabel("% Surviving")
plt.ylim((0,1.5))
plt.legend()
plt.gca().yaxis.set_major_formatter(PercentFormatter(1))
```
```{python}
#| code-fold: false
from scipy.optimize import LinearConstraint
def square_error(x):
e_retention = (np.ones_like(retention) * (x[0]**month) * x[3]) + \
(np.ones_like(retention) * (x[1]**month) * x[4]) + \
(np.ones_like(retention) * (x[2]**month) * (1-x[4]-x[3]))
e_churn = e_retention[:-1] - e_retention[1:]
en_loss = e_churn * alive[0]
loss_error = np.sum((en_loss - loss)**2)
survivor_error = ((e_retention[-1] * alive[0]) - alive[-1])**2
return loss_error + survivor_error
guesses = np.array([0.90, 0.90, 0.90, 0.20, 0.30])
bnds = np.array([(0, 1), (0, 1), (0, 1), (0, 1), (0, 1)])
cons = LinearConstraint([0, 0, 0, 1, 1], 1, 1)
res = minimize(square_error, x0=guesses, bounds=bnds, constraints=[cons], method='COBYQA')
print('Monthly Retention Rate - Segment 1 =', f'{res.x[0]:0.2%}')
print('% of Customer - Segment 1 =', f'{res.x[3]:0.2%}')
print('Monthly Retention Rate - Segment 2 =', f'{res.x[1]:0.2%}')
print('% of Customer - Segment 2 =', f'{res.x[4]:0.2%}')
print('Monthly Retention Rate - Segment 3 =', f'{res.x[2]:0.2%}')
print('% of Customer - Segment 3 =', f'{1-res.x[4]-res.x[3]:0.2%}')
print('Sum of Squared Errors =', f'{res.fun:0.0f}')
```
```{python}
e_retention = (np.ones_like(retention) * (res.x[0]**month) * res.x[3]) + \
(np.ones_like(retention) * (res.x[1]**month) * res.x[4]) + \
(np.ones_like(retention) * (res.x[2]**month) * (1-res.x[4]-res.x[3]))
plt.plot(month, retention, label="Actual")
plt.plot(month, e_retention, label="Expected")
plt.title('Retention Model Curve - 3 Customer Segments')
plt.xlabel("Tenure (Months)")
plt.ylabel("% Surviving")
plt.ylim((0,1.5))
plt.legend()
plt.gca().yaxis.set_major_formatter(PercentFormatter(1))
```
### 1 Customer Segment - Exponential Curve Fitting with Churn Parameter
```{python}
month, alive = np.loadtxt('data/DSC-retention-data.csv', dtype='object', delimiter=',', unpack=True, skiprows=1)
month = month.astype(int)
alive = alive.astype(float)
train_month = month[:8]
train_alive = alive[:8]
```
```{python}
#| code-fold: false
def square_error(theta):
retention = train_alive / train_alive[0]
e_retention = (1-theta)**train_month
return np.sum((retention-e_retention)**2)
theta_guess = 0.5
res = minimize(square_error, theta_guess, bounds=[(0.000001,0.999999)])
theta = res.x[0]
print(f'{theta = :.1%}')
print(f'retention rate = {1-theta:.1%}')
print(f'SSE = {res.fun:.3%}')
```
```{python}
retention = alive / alive[0]
e_retention = (1-theta)**month
plt.figure(figsize=(8,4), dpi=100)
plt.plot(retention, "b-o", label='Actual')
plt.plot(e_retention, "r--o", label='1 Segment SG')
plt.axvline(7, color="black", linestyle='--', linewidth=0.75)
plt.title('One Segment SG')
plt.xlabel("Tenure (Months)")
plt.ylabel("% Surviving")
plt.ylim(0,1.1)
plt.legend()
plt.gca().yaxis.set_major_formatter(PercentFormatter(1))
```
```{python}
rr = alive[1:] / alive[:-1]
e_alive = alive[0] * e_retention
e_rr = e_alive[1:] / e_alive[:-1]
plt.title("Actual Versus Model-Based Estimates of Retention Rates by Tenure")
plt.plot(month[1:], rr, "b", label='Actual')
plt.plot(month[1:], e_rr, "r", label='1 Segment SG')
plt.xlabel("Tenure (Months)")
plt.ylabel("Retention Rate")
plt.legend()
plt.gca().yaxis.set_major_formatter(PercentFormatter(1))
```
### 2 Customer Segments - Exponential Curve Fitting with Churn & Segment Proportion Parameter
```{python}
#| code-fold: false
def square_error(x):
retention = train_alive / train_alive[0]
s1_retention = (1-x[1])**train_month
s2_retention = (1-x[2])**train_month
e_retention = s1_retention*x[0] + s2_retention*(1-x[0])
return np.sum((retention-e_retention)**2)
guesses = [0.2, 0.1, 0.1]
bnd = [(0,1), (0,1), (0,1)]
res = minimize(square_error, guesses, bounds=bnd)
pi1, theta1, theta2 = res.x
print(f'{pi1 = :.1%}, {theta1 = :.1%}, rr = {1-theta1:.1%}\npi2 = {1-pi1:.1%}, {theta2 = :.1%}, rr = {1-theta2:.1%}')
print(f'SSE = {res.fun:.3%}')
```
```{python}
retention = alive / alive[0]
s1_retention = (1-theta1)**month
s2_retention = (1-theta2)**month
e_retention = s1_retention*pi1 + s2_retention*(1-pi1)
plt.figure(figsize=(8,4), dpi=100)
plt.plot(retention, "b-o", label='Actual')
plt.plot(e_retention, "r--o", label='2 Segment SG')
plt.axvline(7, color="black", linestyle='--', linewidth=0.75)
plt.title('One Segment SG')
plt.xlabel("Tenure (Months)")
plt.ylabel("% Surviving")
plt.ylim(0,1.1)
plt.legend()
plt.gca().yaxis.set_major_formatter(PercentFormatter(1))
```
```{python}
rr = alive[1:] / alive[:-1]
e_alive = alive[0] * e_retention
e_rr = e_alive[1:] / e_alive[:-1]
plt.title("Actual Versus Model-Based Estimates of Retention Rates by Tenure")
plt.plot(month[1:], rr, "b", label='Actual')
plt.plot(month[1:], e_rr, "r", label='1 Segment SG')
plt.xlabel("Tenure (Months)")
plt.ylabel("Retention Rate")
plt.legend()
plt.gca().yaxis.set_major_formatter(PercentFormatter(1))
```
### Problem with Homogeneity or Discrete Heterogeneity
So far, our geometric model has assumed either a homogenous customer base (1 customer segment) or 2 to 3 customers segments. Higher segments have performed better but the solutions are less parsimonious. Computing solutions of greater segments becomes particularly difficult to handle given that parameters increase by a factor of 2, requiring a solution for the probability of churn or survival and the associated probability of belonging to a particular segment (weight). Computing, for example, 10 customer segments can be computationally intensive.
While assuming 1 to 3 customers segments for a shifted geometric model may provide a reasonably good fit on the calibration data, the projections beyond the model calibration period can underestimating or overestimate percentage survival. The possible range of error may not be acceptable in some commercial settings.
```{python}
retention = alive / alive[0]
x_month = np.arange(0, 25, 1)
# 1 Segment SG
e_retention1 = (1-theta)**x_month
# 2 Segment SG
s1_retention = (1-theta1)**x_month
s2_retention = (1-theta2)**x_month
e_retention2 = s1_retention*pi1 + s2_retention*(1-pi1)
plt.figure(figsize=(8,4), dpi=100)
plt.plot(retention, "b-", label='Actual')
plt.plot(e_retention1, "g--", label='1 Segment SG')
plt.plot(e_retention2, "r--", label='2 Segment SG')
plt.axvline(7, color="black", linestyle='--', linewidth=0.75)
plt.title('Actual Versus Regression-Model-Based Estimates of the Percentage of Customers Surviving')
plt.xlabel("Tenure (Months)")
plt.ylabel("% Surviving")
plt.ylim(0,1.1)
plt.legend()
plt.gca().yaxis.set_major_formatter(PercentFormatter(1))
```
### Shifted Geometric - Probabilistic distribution, Maximize the likelihood of Observed Churn and Survival
At the end of each contract period, a customer makes the renewal decision by tossing a coin: H → renew, T → don’t renew
| Length of Relationship | |
| ------|----|
| 1 Period | T |
| 2 period | H T |
| 3 Period | H H T |
$$
P(t\space\text{periods}) =
\begin{cases}
P(T) &\quad t=1\\
P(H) \times P(t-1\space\text{periods}) &\quad t=2,3,\cdots \\
\end{cases}
$$
**Model**:
- $P(H)=1−\theta$ is constant and unobserved.
- All customers have the same "churn probability" $\theta$.
- Let the random variable $T$ denote the duration of the customer’s relationship with the firm.
- We assume that the random variable $T$ is distributed geometric with parameter $\theta$:
$$
P(T=t\mid\theta)=\theta(1-\theta)^{t-1}, \space t=1,2,3,\cdots
$$
$$
S(t\mid\theta)=P(T>t\mid\theta) = (1-\theta)^{t}, \space t=0,1,2,3,\cdots
$$
**Assumptions**:
- The observed data were generated according to the “coin flipping” story of contract renewal, and
- we know $P(T)=\theta$, the probability of the observed pattern of renewals is:
$$
\begin{align*}
[P(T=1|\theta)]^{369}[P(T=2|\theta)]^{163}[P(T=3|\theta)]^{86}\\
\times [P(T=4|\theta)]^{56}[S(t|\theta)]^{326} \\
=[\theta]^{369}[\theta(1-\theta)]^{163}[\theta(1-\theta)^{2}]^{86} \\
[\theta(1-\theta)^{3}]^{56}[\theta(1-\theta)^{4}]^{326}
\end{align*}
$$
Suppose we have two candidate coins:
- Coin A: $\theta = 0.2$
- Coin B: $\theta = 0.5$
Which coin is more likely to have generated the observed pattern of renewals across this set of 1000 customers?
| $\theta$ | $P(data \mid \theta)$ | $\ln[P(data \mid \theta)]$|
|-----|----------------|---------|
| $0.2$ | $6.00 \times 10^{-647}$ | $-1488.0$ |
| $0.5$ | $1.40 \times 10^{-747}$ | $-1718.7$ |
We estimate the model parameters using the method of **maximum likelihood**:
- The likelihood function is defined as the probability of observing the data for a given set of the (unknown) model parameters.
- It is computed using the model and is viewed as a function of the model parameters:
$$
L(\text{parameters} \mid \text{data})= p(\text{data} \mid \text{parameters}).
$$
- For a given dataset, the maximum likelihood estimates of the model parameters are those values that maximize $L( \cdot )$.
- It is typically more convenient to use the **natural logarithm of the likelihood function** — the **log-likelihood** function.
The log-likelihood function is given by:
$$
\begin{align*}
LL(θ|data) = &369 \times ln[P(T=1 |θ)]+\\
&163 \times ln[P(T=2 |θ)]+\\
&86 \times ln[P(T=3 |θ)]+\\
&56 \times ln[P(T=4 |θ)]+\\
&326 \times ln[S(4 |θ)]
\end{align*}
$$
The maximum value of the log-likelihood function is
$LL = −1451.2$, which occurs at $\theta = 0.272$.
```{python}
year, alive = np.loadtxt('data/hardie-sample-retention.csv', dtype='object', delimiter=',', unpack=True, skiprows=1)
year = year.astype(int)
alive = alive.astype(float)
train_year = year[:5]
train_alive = alive[:5]
```
```{python}
#| code-fold: false
def log_likelihood(theta):
n_lost = train_alive[:-1] - train_alive[1:]
p_churn = theta * (1 - theta)**(train_year - 1)
ll_churn = np.sum(n_lost * np.log(p_churn[1:]))
ll_alive = train_alive[-1] * np.log((1-theta)**train_year[-1])
return -(ll_churn + ll_alive)
guess = 0.5
bnds = [(0.00001,0.99999)]
result = minimize(log_likelihood, guess, bounds=bnds)
theta, ll = result.x, result.fun
print(f'θ = {theta[0]:.3f}')
print(f'log-likelihood = {-ll:0.1f}')
```
```{python}
retention = alive / alive[0]
e_retention = (1-theta)**year
#plt.figure(figsize=(8,4), dpi=100)
plt.plot(retention, "b-o", label='Actual')
plt.plot(e_retention, "r--o", label='1 Segment SG')
plt.axvline(5, color="black", linestyle='--', linewidth=0.75)
plt.title('Shifted Geometric - Survival Curve Projection')
plt.xlabel("Tenure (years)")
plt.ylabel("% Surviving")
plt.ylim(0,1.1)
plt.legend()
plt.gca().yaxis.set_major_formatter(PercentFormatter(1))
```
```{python}
rr = alive[1:] / alive[:-1]
e_alive = alive[0] * e_retention
e_rr = e_alive[1:] / e_alive[:-1]
plt.title("Actual Versus Model-Based Estimates of Retention Rates by Tenure")
plt.plot(year[1:], rr, "b", label='Actual')
plt.plot(year[1:], e_rr, "r", label='1 Segment SG')
plt.xlabel("Tenure (years)")
plt.ylabel("Retention Rate")
plt.legend()
plt.gca().yaxis.set_major_formatter(PercentFormatter(1))
```
```{python}
def log_likelihood(theta):
n_lost = train_alive[:-1] - train_alive[1:]
p_churn = theta * (1 - theta)**(train_year - 1)
ll_churn = np.sum(n_lost * np.log(p_churn[1:]))
ll_alive = train_alive[-1] * np.log((1-theta)**train_year[-1])
return (ll_churn + ll_alive)
theta_x = np.linspace(0.02, 0.5, 500)
ll_y = np.vectorize(log_likelihood)(theta_x)
plt.figure(figsize=(7,4.5))
plt.title("Log-Likelihood of a given $θ$")
plt.plot(theta_x, ll_y, "k", lw=0.75)
plt.xlabel("$θ$")
plt.ylabel("$ln[P(\\text{data})]$")
plt.ylim(-3000, -1400)
plt.gca().spines[['right', 'top']].set_visible(False)
```
### Accommodating Heterogeneity in θ - From Discrete to Continuous
**Visualizing Parameter Estimates**
{fig-align="center" width="350"}
- Consider the following story of customer behavior:
1. At the end of each period, an individual renews his contract with (constant and unobserved) probability $1−θ$.
2. “Churn probabilities” vary across customers.
- Since we don’t know any given customer’s true value of $θ$, we treat it as a realization of a random variable ($Θ$).
- We need to specify a probability distribution that captures how $θ$ varies across customers (by giving us the probability of each possible value of $θ$).
- Suppose we divide $(0,1)$ into three sub-intervals of equal width: 0.000–0.333, 0.333–0.667, 0.667–1.000
- We allow $θ$ to take on the value of the mid-point of each sub-interval:
$$
Θ =
\begin{cases}
0.167 &\quad \text{with probability}\space P(Θ=0.167)\\
0.500 &\quad \text{with probability}\space P(Θ=0.500)\\
0.833 &\quad \text{with probability}\space P(Θ=0.833)\\
\end{cases}
$$
{fig-align="center" width="350"}
$$
P(Θ=0.167)+P(Θ=0.500)+P(Θ=0.833)=1
$$
- What is the probability that a randomly chosen new customer will cancel their contract at the end of period $t$?
1. If we knew their $θ$, it would simply be $P(T = t | θ)$.
2. Since we only know the distribution of $Θ$ across the population, we evaluate $P(T = t | θ)$ for each possible value of $θ$, weighting it by the probability of a randomly chosen new customer having that value of $θ$:
$$
\begin{align*}
P(T = t) &= P(T = t |Θ = 0.167) P(Θ = 0.167) \\
&+ P(T = t |Θ = 0.500) P(Θ = 0.500) \\
&+ P(T = t |Θ = 0.833) P(Θ = 0.833)
\end{align*}
$$
- The problem with this is that $E(Θ)$ is bounded between 0.167 and 0.833.
- To allow for greater flexibility, suppose we divide $(0, 1)$ into ten sub-intervals of equal width and allow $θ$ to take on the value of the mid-point of each sub-interval:
$$
Θ =
\begin{cases}
0.05 &\quad \text{with probability}\space P(Θ=0.05)\\
0.15 &\quad \text{with probability}\space P(Θ=0.15)\\
&\quad \cdots \\
0.95 &\quad \text{with probability}\space P(Θ=0.95)\\
\end{cases}
$$
{fig-align="center" width="350"}
$$
P(Θ = 0.05) + P(Θ = 0.15) + \cdots +P(Θ = 0.95) = 1
$$
The probability that a randomly chosen new customer will cancel their contract at the end of period $t$ is:
$$
\begin{align*}
P(T = t) &= P(T = t |Θ = 0.05)P(Θ = 0.05) \\
&+ P(T = t |Θ = 0.15) P(Θ = 0.15) \\
&+ \cdots \\
&+ P(T = t |Θ = 0.95) P(Θ = 0.95)
\end{align*}
$$
This ten sub-interval solution is more flexible—$E(Θ)$ is now bounded between 0.05 and 0.95—but is less parsimonious.
## Shifted Beta Geometric (sBG) Model
The shifted beta geometric model (sBG) is a model that is used to forecast retention/survival of users in contractual settings (think netflix, disney plus, tinder gold, etc). The model is quite simple and posits:
- At the end of each period, a customer flips a coin: “tails” she cancels the contract, “heads” she renews it.
- For each individual, the probability of a coin coming up “tails” does not change over time
- The probability of heads varies across customers.
The two things we need are the probability density function and the survival function. Mathematically, the probability density and survival function are:
$$
P(T=t \mid \alpha, \beta) = \dfrac{B(\alpha+1, \beta+t-1)}{B(\alpha, \beta)}
$$
$$
S(T=t \mid \alpha, \beta) = \dfrac{B(\alpha, \beta+t)}{B(\alpha, \beta)}
$$
Here, $B(\alpha, \beta)$ is the beta function and not the beta distribution.
### Accommodating Heterogeneity in θ - From Discrete to Continuous
In order to increase flexibility without sacrificing parsimony (unlike increasing the number of discrete segments), we let the number of sub-intervals go to infinity and represent the probabilities in terms of a simple continuous function $g(θ)$ defined over $(0, 1)$:
{fig-align="center" width="350"}
- The probability of getting a specific value of $θ$ is infinitesimally small.
- Discrete $\rightarrow$ continuous $\Rightarrow\Sigma \rightarrow \int$
- By definition, $P(0 ≤ Θ ≤ 1) = 1 \Leftrightarrow$ area under the curve, $\int_0^1 g(\theta)d\theta$, equals one.
- For a randomly chosen customer,
$$
P(T=t)=\int_0^1 P(T=t|\theta)g(\theta)d\theta
$$
**The Beta Distribution**
The beta distribution is a flexible (and mathematically convenient) two-parameter distribution bounded between 0 and 1:
$$
g(\theta | \gamma,\delta)=\frac{\theta^{\gamma-1}(1-\theta)^{\delta-1}}{B(\gamma,\delta)}\hspace{20mm} \gamma,\delta \gt 0
$$
where $B(\gamma,\delta)$ is the beta function.
The mean and variance of the beta distribution are
$$
E(Θ)=\frac{\gamma}{\gamma+\delta}
$$
$$
var(Θ)=\frac{\gamma\delta}{(\gamma+\delta)^{2}(\gamma+\delta+1)}
$$
Generally implemented as:
$$
f(x; a, b) = \frac{\Gamma(a+b)x^{a-1}(1-x)^{b-1}}{\Gamma(a)\Gamma(b)} \hspace{20mm} 0 \leq x \leq 1, \space a \gt 0, \space b \gt 0
$$
**Four General Shapes of the Beta Distribution**
{fig-align="center" width="350"}
```{python}
from scipy.stats import beta
gamma, delta = np.array([0.5, 1.5]), np.array([3, 0.5])
# Probability Distribution Plotting:
x = np.linspace(beta.ppf(0, gamma[0], delta[0]), beta.ppf(1, gamma[0], delta[0]), 100)
fig, axes = plt.subplots(2, 2, figsize=(7,6))
for i in range(2):
for j in range(2):
ax = axes[i][j]
ax.plot(x, beta.pdf(x, gamma[i], delta[j]), label=f'γ={gamma[i]}, δ={delta[j]}', color='black', lw=0.75)
ax.set_xlim(0,1)
ax.set_xlabel('θ')
ax.legend(fontsize=6)
```
If one thinks about how the “coin-flip” probabilities are likely to vary across individuals, there are four principal possibilities, as illustrated in the plots above. If both parameters of the beta distribution ($\alpha$ and $\beta$) are small ($<1$), then the mix of churn probabilities is “U-shaped,” or highly polarized across customers. If both parameters are relatively large ($\alpha,\beta>1$), then the probabilities are fairly homogeneous. Likewise, the distribution of probabilities can be “J-shaped” or “reverse-J-shaped” if the parameters fall within the remaining ranges as shown in the figure. These parameters can offer useful diagnostics to help managers understand the **degree (and nature) of heterogeneity in churn probabilities** across the customer base.
**The Beta Function**
The beta function $B(\gamma,\delta)$ is defined by the integral:
$$
B(\gamma,\delta):=\int_0^1 t^{\gamma-1}(1-t)^{\delta-1}dt, \space \gamma,\delta \gt 0
$$
and can be expressed in terms of gamma functions:
$$
B(\gamma,\delta)=\frac{\Gamma(\gamma)\Gamma(\delta)}{\Gamma(\gamma+\delta)}
$$
The gamma function $\Gamma(\gamma)$ is a **generalized factorial**, which has the recursive property $Γ(γ + 1) = γΓ(γ)$. Since $Γ (0) = 1, Γ(n) = (n − 1)!$ for positive integer $n$.
**Developing the Model**
For a randomly chosen individual,
$$
\begin{aligned}
P(T=t|\gamma,\delta) &= \int_0^1 P(T=t|\theta)g(\theta|\gamma,\delta)d\theta \\
&= \int_0^1 \theta(1-\theta)^{t-1}\frac{\theta^{\gamma-1}(1-\theta)^{\delta-1}}{B(\gamma,\delta)}d\theta \\
&= \frac{1}{B(\gamma,\delta)}\int_0^1 \theta^{\gamma}(1-\theta)^{\gamma+t-2}d\theta \\
&= \frac{B(\gamma+1,\delta + t - 1)}{B(\gamma, \delta)}
\end{aligned}
$$
Similarly,
$$
\begin{aligned}
S(t|\gamma,\delta) &= \int_0^1 S(t|\theta)g(\theta|\gamma,\delta)d\theta \\
&= \int_0^1 \theta(1-\theta)^{t}\frac{\theta^{\gamma-1}(1-\theta)^{\delta-1}}{B(\gamma,\delta)}d\theta \\
&= \frac{1}{B(\gamma,\delta)}\int_0^1 \theta^{\gamma-1}(1-\theta)^{\gamma+t-1}d\theta \\
&= \frac{B(\gamma,\delta + t)}{B(\gamma, \delta)}
\end{aligned}
$$
We call this *continuous mixture* model the beta-geometric (BG) distribution.
We can compute BG probabilities using the following forward-recursion formula from $P(T = 1)$:
$$
P(T=t|\gamma,\delta) =
\begin{cases}
\frac{\gamma}{\gamma + \delta} &\quad t = 1\\
\\
\frac{\delta + t - 2}{\gamma + \delta + t - 1}\times P(T=t-1) &\quad t=2,3,\cdots\\
\end{cases}
$$
**Estimating Model Parameters**
Assuming:
1. the observed data were generated according to the heterogeneous “coin flipping” story of contract renewal, and
2. we know $γ$ and $δ$, the probability of the observed pattern of renewals is:
$$
[P(T=1|\gamma,\delta)]^{369}[P(T=2|\gamma,\delta)]^{163}[P(T=3|\gamma,\delta)]^{86}\\
[P(T=4|\gamma,\delta)]^{56}[S(4|\gamma,\delta)]^{326}
$$
The log-likelihood function is given by:
$$
\begin{align*}
LL(\gamma, \delta|data)=&369 \times ln[P(T=1 |\gamma, \delta)]+\\
&163 \times ln[P(T=2 |\gamma, \delta)]+\\
&86 \times ln[P(T=3 |\gamma, \delta)]+\\
&56 \times ln[P(T=4 |\gamma, \delta)]+\\
&326 \times ln[S(4 |\gamma, \delta)]
\end{align*}
$$
The maximum value of the log-likelihood function is $LL = −1401.6$, which occurs at $γ = 0.764$ and $δ = 1.296$.
### Model Implementation
```{python}
year, alive = np.loadtxt('data/hardie-sample-retention.csv', dtype='object', delimiter=',', unpack=True, skiprows=1)
year = year.astype(int)
alive = alive.astype(float)
train_year = year[:5]
train_alive = alive[:5]
```
```{python}
#| code-fold: false
def log_likelihood(x):
gamma, delta = x[0], x[1]
p_churn = np.zeros_like(train_alive[1:])
p_churn[0] = gamma / (gamma + delta) # Define the base probability for t=1
terms = (delta + train_year[2:] - 2) / (gamma + delta + train_year[2:] - 1) # Define the sequence for t=2,3, ...
p_churn[1:] = np.cumprod(terms) * p_churn[0] # Calculate cumulative products for probabilities
n_lost = train_alive[:-1] - train_alive[1:]
ll_churn = np.sum(n_lost * np.log(p_churn))
ll_alive = train_alive[-1] * np.log(1-np.sum(p_churn))
return -(ll_churn + ll_alive)
guess = [1, 1]
bnds = [(0, np.inf), (0, np.inf)]
result = minimize(log_likelihood, guess, bounds=bnds)
gamma, delta = result.x[0], result.x[1]
ll = -result.fun
print(f'γ = {gamma:.3f}\nδ = {delta:.3f}\nLog-Likelihood = {ll:,.2f}')
```
```{python}
retention = alive / alive[0]
p_churn = np.zeros_like(alive)
p_churn[1] = gamma / (gamma + delta)
terms = (delta + year[2:] - 2) / (gamma + delta + year[2:] - 1)
p_churn[2:] = np.cumprod(terms) * p_churn[1]
e_retention = np.ones_like(alive)