-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathEstimating-Purchasing-Concentration.qmd
149 lines (119 loc) · 4.21 KB
/
Estimating-Purchasing-Concentration.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
---
title: Estimating Concentration in Champagne Purchasing
author: Abdullah Mahmood
date: last-modified
format:
html:
theme: cosmo
css: quarto-style/style.css
highlight-style: atom-one
mainfont: Palatino
fontcolor: black
monobackgroundcolor: white
monofont: Menlo, Lucida Console, Liberation Mono, DejaVu Sans Mono, Bitstream Vera Sans Mono, Courier New, monospace
fontsize: 13pt
linestretch: 1.4
number-sections: true
number-depth: 5
toc: true
toc-location: right
toc-depth: 5
code-fold: true
code-copy: true
cap-location: bottom
format-links: false
embed-resources: true
anchor-sections: true
code-links:
- text: GitHub Repo
icon: github
href: https://github.com/abdullahau/customer-analytics/
- text: Quarto Markdown
icon: file-code
href: https://github.com/abdullahau/customer-analytics/blob/main/Estimating-Purchasing-Concentration.qmd
html-math-method:
method: mathjax
url: https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js
---
## Import
```{python}
import numpy as np
import pandas as pd
from scipy.optimize import minimize
import matplotlib.pyplot as plt
%config InlineBackend.figure_formats = ['svg']
```
## Concentration 101
- Concentration in customer purchasing means that a small proportion of customers make a large proportion of the total purchases of the product (e.g., “80/20”).
$$
\text{Higher Concentration} \Leftrightarrow \text{Greater Inequality}
$$
- The *Lorenz curve* is used to illustrate the degree of inequality in the distribution of a quantity of interest (e.g., purchasing, income, wealth).
- The greater the curvature of the Lorenz Curve, the greater the concentration/inequality.
- Every point on the Lorenz curve represents the $y%$ of the quantity of interest accounted for by the bottom $x%$ of all relevant individuals:
$$
y = L(x)
$$
- $80/20$ represents a spcific point on the Lorenz Curve: $20=L(80)$
- The *Gini coefficient* is the ratio of the area between the $45^{\circ}$ line (“line of perfect equality”) and the Lorenz curve to the area under the line of perfect equality.
```{python}
d = pd.DataFrame({
'x': range(6),
'f_x': [70, 45, 25, 15, 10, 5],
})
d['total_units'] = d.x * d.f_x
total_units = d.where(d.x > 0).total_units
total_buyers = d.where(d.x > 0).f_x
print('Total Units = ', total_units.sum())
print('Total Buyers = ', total_buyers.sum())
```
Consider those buyers that purchased $x$ times $(x ≥ 1)$:
- What proportion of total buyers are they?
$$
\frac{P(X=x)}{1-P(X=0)},
$$
- where $P(X=x)$ is the percentage of customers for all levels of purchases, and
- $P(X=0)$ is the percentage of people who made 0 purchase.
- What proportion of total purchasing do they account for?
$$
\frac{xP(X=x)}{E(X)}
$$
- where $xP(X=x)$ is the product of purchase levels and percentage of customers who made those purchases, and
- $E(x)$ is the sum total of $xP(X=x)$.
```{python}
d['pct_buyers'] = total_buyers / total_buyers.sum()
d['pct_purchases'] = total_units / total_units.sum()
d['cum_pct_buyers'] = d.pct_buyers.cumsum()
d['cum_pct_purchases'] = d.pct_purchases.cumsum()
d = d.fillna(0)
d
```
```{python}
plt.clf()
plt.bar(d.x, d.f_x, color='k')
plt.xlabel('# Units')
plt.ylabel('# People')
plt.title('Hypothetical distribution of purchases (n = 170 people)');
```
```{python}
plt.clf()
plt.plot(d.cum_pct_buyers, d.cum_pct_purchases, color='k', marker='o', label='Lorenz Curve')
x = np.linspace(0, 1)
plt.plot(x,x,'k--', label='Line of Perfect Equality')
plt.xlabel('Cumulative % Buyers')
plt.ylabel('Cumulative % Purchases')
plt.title('Lorenz Curve')
plt.legend();
```
## Problem
Consider the following data on the number of bottles of champagne purchased in a year by a sample of 568 French households:
```{python}
d = pd.DataFrame({
'x': range(9),
'f_x': [400, 60, 30, 20, 8, 8, 9, 6, 27],
})
total_units = d.where(d.x > 0).total_units
total_buyers = d.where(d.x > 0).f_x
print('Total Units = ', total_units.sum())
print('Total Buyers = ', total_buyers.sum())
```