forked from bvkrauth/is4e
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path04-Random-variables.Rmd
1201 lines (990 loc) · 44.3 KB
/
04-Random-variables.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Random variables {#random-variables}
```{r setup4, include=FALSE}
knitr::opts_chunk$set(echo = FALSE,
prompt = FALSE,
tidy = TRUE,
collapse = TRUE)
library("tidyverse")
```
Economics is a mostly *quantitative* field: its outcomes can be usually
described using numbers: price, quantity, interest rates, unemployment rates,
GDP, etc. Statistics is also a quantitative field: every statistic is a number
calculated from data. When a random outcome is described by a number, we call
that number a "random variable." We can use probability theory to describe and
model random variables.
This chapter will introduce the basic terminology and mathematical tools for
working with simple random variables.
::: {.goals data-latex=""}
**Chapter goals**
In this chapter, we will learn how to:
1. Define a random variable in terms of a random outcome.
2. Determine the support and range of a random variable.
3. Calculate and interpret the PDF of a discrete random variable.
4. Calculate and interpret the CDF of a discrete random variable.
5. Calculate interval probabilities from the CDF.
6. Calculate the expected value of a discrete random variable from its PDF.
7. Calculate a quantile from the CDF.
8. Calculate the variance of a discrete random variable from its PDF.
9. Calculate the variance from expected values.
9. Calculate the standard deviation from the variance.
10. Calculate the expected value for a linear function of a random variable.
11. Calculate the variance and standard deviation for a linear function of a
random variable.
12. Standardize a random variable.
13. Use standard discrete probability distributions:
- Bernoulli
- binomial
- discrete uniform.
:::
To prepare for this chapter, please review the chapter on
[probability and random events](#probability) and the section on
[sequences and summations](#sequences-and-summations) in the math appendix.
## Defining a random variable {#introduction-to-random-variables}
A ***random variable*** is a number whose value depends on a random outcome. The
idea here is that we are going to use a random variable to describe some (but
not necessarily every) aspect of the outcome.
::: example
**Random variables in roulette**
Here are a few random variables we could define in a roulette game:
- The original outcome $b$
- An indicator for whether a bet on red wins:
$$r = I(b \in Red)=\begin{cases}1 & b \in Red\\ 0 & b \notin Red \\ \end{cases}$$
- The net payout from a \$1 bet on red:
$$ w_{red} = w_{red}(b) = \begin{cases} 1 & \textrm{ if } b \in Red \\ -1 & \textrm{ if } b \in Red^c \end{cases} $$
That is, a player who bets \$1 on red wins \$1 if the ball lands on red
and loses \$1 if the ball lands anywhere else.
- The net payout from a \$1 bet on 14:
$$ w_{14} = w_{14}(b) = \begin{cases} 35 & \textrm{ if } b = 14 \\ -1 & \textrm{ if } b \neq 14 \end{cases} $$
That is, a player who bets \$1 on 14 wins \$35 if the ball lands on 14
and loses \$1 if the ball lands anywhere else.
All of these random variables are defined in terms of the underlying outcome.
:::
A random variable is always a function of the original outcome, but for
convenience, we usually leave its dependence on the original outcome implicit,
and write it as if it were an ordinary variable.
### Implied distribution {#probability-distributions}
Since every random variable is a number, we can define its sample space as the
set of real numbers $\mathbb{R}$.
Each random variable has its own probability distribution over this sample space
and this probability distribution can be derived from the probability
distribution of the underlying outcome. That is, let
$\omega \in \Omega$ be some random outcome, and let $x = x(\omega)$
be some random variable that depends on that outcome. Then the probability
that $x$ is in some set $A$ is:
$$\Pr(x \in A) = \Pr(\{\omega \in \Omega: x(\omega) \in A\})$$
Again, this definition looks complicated but is easier to follow with a few
simple examples.
::: example
**Probability distributions for roulette**
Assuming we have a fair roulette game:
- We already know that the probability distribution for $b$ is:
$$\Pr(b = 0) = 1/37 \approx 0.027$$
$$\Pr(b = 1) = 1/37 \approx 0.027$$
$$\vdots$$
$$\Pr(b = 36) = 1/37 \approx 0.027$$
$$\Pr(b \notin \{0,1,\ldots,36\}) = 0$$
- The probability distribution for $w_{red}$ is:
$$\Pr(w_{red} = 1) = \Pr(b \in Red) = 18/37 \approx 0.486$$
$$\Pr(w_{red} = -1) = \Pr(b \notin Red) = 19/37 \approx 0.514$$
$$\Pr(w_{red} \notin \{-1,1\}) = 0$$
- The probability distribution for $w_{14}$ is:
$$\Pr(w_{14} = 35) = \Pr(b = 14) = 1/37 \approx 0.027$$
$$\Pr(w_{14} = -1) = \Pr(b \neq 14) = 36/37 \approx 0.973$$
$$\Pr(w_{14} \notin \{-1,35\}) = 0$$
Notice that these random variables are related to each other since they all
depend on the same underlying outcome. Section \@ref(multiple-random-variables)
will explain how we can describe and analyze those relationships.
:::
### The support {#the-support}
The ***support*** of a random variable $x$ is the smallest[^401] set
$S_x \subset \mathbb{R}$ such that $\Pr(x \in S_x) = 1$.
[^401]: Technically, it is the smallest *closed* set, but let's ignore that.
In plain language, the support is the set of all values in the sample space that
have some chance of actually happening.
::: example
**The support in roulette**
The sample space of $b$ is $\mathbb{R}$ and the support of $b$ is
$S_{b} = \{0,1,2,\ldots,36\}$.
The sample space of $w_{Red}$ is $\mathbb{R}$ and the support of
$w_{Red}$ is $S_{Red} = \{-1,1\}$.
The sample space of $w_{14}$ is $\mathbb{R}$ and the support of
$w_{14}$ is $S_{14} = \{-1,35\}$.
:::
The random variables we will consider in this chapter have ***discrete***
support. That is, the support is a set of isolated points each of which has
a strictly positive probability. In most examples the support will also have
a ***finite*** number of elements. All finite sets are also discrete, but
it is also possible for a discrete set to have an infinite number of elements.
For example, the set of positive integers $\{1,2,3,\ldots\}$ is both discrete
and infinite.
Some random variables have a support that is continuous rather than discrete.
Chapter \@ref(more-on-random-variables) will cover continuous random variables.
### The PDF {#the-pdf-of-a-discrete-random-variable}
We can describe the probability distribution of a random variable with a
function called its ***probability density function (PDF)***.
The PDF of a discrete random variable is defined as:
$$f_x(a) = \Pr(x = a)$$
where $a$ is any number. By convention, we typically use a lower-case $f$ to
represent a PDF, and we use the subscript when needed to clarify which specific
random variable we are talking about.
In some cases we are just given the PDF, in others we may need to calculate it
using the tools we have already learned.
::: example
**The PDF in roulette**
Our three random variables are all discrete, and each has its own PDF:
$$f_b(a) = \Pr(b = a) = \begin{cases}
1/37 & a \in \{0,1,\ldots,36\} \\
0 & a \notin \{0,1,\ldots,36\} \\
\end{cases}$$
$$f_{red}(a) = \Pr(w_{red} = a) = \begin{cases}
19/37 & a = -1 \\
18/37 & a = 1 \\
0 & a \notin \{-1,1\} \\
\end{cases}$$
$$f_{14}(a) = \Pr(w_{14} = a) = \begin{cases}
36/37 & a = -1 \\
1/37 & a = 35 \\
0 & a \notin \{-1,35\} \\
\end{cases}$$
Figure \@ref(fig:RoulettePDF) below shows these three PDFs.
```{r RoulettePDF, fig.cap = "*PDFs for the roulette example*"}
RoulettePDF <- tibble(a = seq(from = -2, to= 36),
fb = c(0, 0, rep(1/37, times = 37)),
fred = c(0, 19/37, 0, 18/37, rep(0, times=35)),
f14 = c(0, 36/37,rep(0, times=35), 1/37, 0))
ggplot(data = RoulettePDF, mapping = aes(x = a)) +
geom_point(aes(y=fb), col = "blue") +
geom_point(aes(y=fred), col = "red") +
geom_point(aes(y=f14), col = "orange") +
xlab("a") +
ylab("f(a)") +
ylim(0,1) +
geom_text(x = 4,
y = 4/37,
label = "f_b(a)",
col = "blue") +
geom_text(x = 4,
y = 18/37,
label = "f_red(a)",
col = "red") +
geom_text(x = 2,
y = 36/37,
label = "f_14(a)",
col = "orange") +
labs(title = "Probability density function (PDF)",
subtitle = "Roulette",
caption = "",
tag = "")
```
Note that the points overlap, so you may not be able to see each value for a
given PDF.
:::
The PDF provides a complete description of the probability distribution of a
random variable. That is, for any random variable $x$ and any event
$A \subset \mathbb{R}$ we can calculate $\Pr(x \in A)$ by simply adding up the
corresponding PDF of $x$:
\begin{align}
\Pr(x \in A) &= \sum_{s \in A} \Pr(x = s) \\
&= \sum_{s \in S_x} f_x(s)I(s \in A)
\end{align}
If you are unfamiliar with the notation here, please refer to the sections on
[summations](#summations) and [the indicator function](#the-indicator-function)
in the Math Review Appendix. The formula is easy to use once you understand the
notation.
::: example
**Some event probabilities in roulette**
Since the outcome in roulette is discrete, we can calculate any event
probability by adding up the probabilities of the event's component outcomes.
The probability of the event $b \leq 1$ can be calculated:
\begin{align}
\Pr(b \leq 1) &= \sum_{s=0}^{36}f_x(s)I(s \leq 1) \\
&= \underbrace{f_b(0)}_{1/37} \underbrace{I(0 \leq 1)}_{1} +
\underbrace{f_b(1)}_{1/37} \underbrace{I(1 \leq 1)}_{1} +
\underbrace{f_b(2)}_{1/37} \underbrace{I(2 \leq 1)}_{0} +
\cdots +
\underbrace{f_b(36)}_{1/37} \underbrace{I(36 \leq 1)}_{0} \\
&= 2/37
\end{align}
The probability of the event $b \in Even$ can be calculated:
\begin{align}
\Pr(b \in Even) &= \sum_{s=0}^{36}f_x(s)I(s \in Even) \\
&= \underbrace{f_b(0)}_{1/37} \underbrace{I(0 \in Even)}_{0} +
\underbrace{f_b(1)}_{1/37} \underbrace{I(1 \in Even)}_{0} +
\underbrace{f_b(2)}_{1/37} \underbrace{I(2 \in Even)}_{1} +
\cdots +
\underbrace{f_b(36)}_{1/37} \underbrace{I(36 \in Even)}_{1} \\
&= 18/37
\end{align}
Remember that zero is not counted as an even number in roulette, so it is not
in the event $Even$.
:::
The PDF of a discrete random variable has several general properties:
1. It is always between zero and one:
$$0 \leq f_x(a) \leq 1$$
since it is a probability.
2. It sums up to one over the support:
$$\sum_{a \in S_x} f_x(a) = \Pr(x \in S_x) = 1$$
since the support has probability one by definition.
3. It is strictly positive for all values in the support:
$$a \in S_x \implies f_x(a) > 0$$
since the support is the *smallest* set that has probability one.
You can confirm that examples above all satisfy these properties.
### The CDF {#the-cdf}
Another way to describe the probability distribution of a random variable is
with a function called its ***cumulative distribution function (CDF)***.
The CDF is a little less intuitive than the PDF, but it has the advantage that
it always has the same definition whether the random variable is discrete,
continuous, or even some combination of the two.
The CDF of the random variable $x$ is the function
$F_x:\mathbb{R} \rightarrow [0,1]$ defined by:
$$F_x(a) = Pr(x \leq a)$$
where $a$ is any number. By convention, we typically use an upper-case $F$ to
indicate a CDF, and we use the subscript to indicate what random variable we are
talking about.
We can construct the CDF of a discrete random variable by just adding up the
PDF:
\begin{align}
F_x(a) &= \Pr(x \leq a) \\
&= \sum_{s \in S_x} f_x(s)I(s \leq a)
\end{align}
This formula leads to a "stair-step" appearance: the CDF is flat for all values
outside of the support, and then jumps up at all values in the support.
::: example
**CDFs for roulette**
- The CDF of $b$ is:
$$F_b(a) = \begin{cases}
0 & a < 0 \\
1/37 & 0 \leq a < 1 \\
2/37 & 1 \leq a < 2 \\
\vdots & \vdots \\
36/37 & 35 \leq a < 36 \\
1 & a \geq 36 \\
\end{cases}$$
- The CDF of $w_{red}$ is:
$$F_{red}(a) = \begin{cases}
0 & a < -1 \\
19/37 & -1 \leq a < 1 \\
1 & a \geq 1 \\
\end{cases}$$
- The CDF of $w_{14}$ is:
$$F_{14}(a) = \begin{cases}
0 & a < -1 \\
36/37 & -1 \leq a < 35 \\
1 & a \geq 35 \\
\end{cases}$$
:::
The CDF has several properties:
1. The CDF is a *probability*, just like the PDF. For any number $a$ we know
that:
$$0 \leq F_x(a) \leq 1$$
2. The CDF is *non-decreasing*. For any two numbers $a$ and $b$ so that
$a \leq b$, we know that:
$$F_x(a) \leq F_x(b)$$
3. The CDF *runs from zero to one*. That is, it is zero or close to zero for
low values of $a$, and one or close to one for high values of $a$. We can use
limits to give precise meaning to the broad terms "close", "low", and "high":
$$\lim_{a \rightarrow -\infty} F_x(a) = \Pr(x \leq -\infty) = 0$$
$$\lim_{a \rightarrow \infty} F_x(a) = \Pr(x \leq \infty) = 1$$
You can review the section on [limits](#limits) in the math appendix if you
do not follow the notation.
::: example
**CDF properties**
Figure \@ref(fig:RouletteCDF) below graphs the CDFs from the previous example:
```{r RouletteCDF, fig.cap = "*CDFs for the roulette example*"}
RouletteCDF <- RoulettePDF %>%
mutate (Fb = cumsum(fb)) %>%
mutate (Fred = cumsum(fred))%>%
mutate (F14 = cumsum(f14))
ggplot(data = RouletteCDF, mapping = aes(x = a)) +
geom_step(aes(y=Fb), col = "blue") +
geom_step(aes(y=Fred), col = "red") +
geom_step(aes(y=F14), col = "orange") +
xlab("a") +
ylab("F(a)") +
geom_text(x = 7,
y = 4/37,
label = "F_b(a)",
col = "blue") +
geom_text(x = 5,
y = 18/37,
label = "F_red(a)",
col = "red") +
geom_text(x = 8,
y = 32/37,
label = "F_14(a)",
col = "orange") +
labs(title = "Cumulative distribution function (CDF)",
subtitle = "Roulette",
caption = "",
tag = "")
```
Notice that they show all of the general properties described above:
- The CDF never goes down, only goes up or stays the same.
- The CDF runs from zero to one, and never leaves that range.
In addition, all of these CDFs have a distinctive "stair-step" shape, jumping up
at each point in $S_x$ and flat between those points, This is a general
property of CDFs for discrete random variables.
:::
In addition to constructing the CDF from the PDF, we can also go the other way,
and construct the PDF of a discrete random variable from its CDF. Each little
jump in the CDF is a point in the support, and the size of the jump is exactly
equal to the PDF.
::: {.fyi data-latex=""}
In more formal mathematics, the formula for deriving the PDF of a discrete
random variable from its CDF would be written:
$$f_x(a) = F_x(a) - \lim_{\epsilon \rightarrow 0} F_x(a-|\epsilon|)$$
but we can just think of it as the size of the jump.
:::
### Interval probabilities
Finally, we can use the CDF to calculate the probability that $x$ lies in any
interval. That is, let $a$ and $b$ be any two numbers such that $a < b$. Then:
\begin{align}
F(b) - F(a) &= \Pr(x \leq b) - \Pr(x \leq a) \\
&= \Pr((x \leq a) \cup (a < x \leq b)) - \Pr(x \leq a) \\
&= \Pr(x \leq a) + \Pr(a < x \leq b) - \Pr(x \leq a) \\
&= \Pr(a < x \leq b)
\end{align}
Notice that we have to be a little careful here to distinguish between the
strict inequality $<$ and the weak inequality $\leq$, because it is always
possible for $x$ to be exactly equal to $a$ or $b$.
::: example
**Calculating interval probabilities**
Consider the CDF for $b$ derived above. Then:
\begin{align}
\Pr(b \leq 36) &= F_b(36) \\
&= 1 \\
\Pr(0 < b \leq 36) &= F_b(36) - F_b(0) \\
&= 1 - 1/37 \\
&= 36/37
\end{align}
Note that the placement of the $<$ and $\leq$ are important here.
What if we want $\Pr(0 \leq b \leq 36)$ instead? We can split that event into
two disjoint events $(b = 0)$ and $(0 < b \leq 36)$ and apply the axioms of
probability:
\begin{align}
\Pr(0 \leq b \leq 36) &= \Pr( (b = 0) \cup (0 < b \leq 36) ) \\
&= \Pr(b = 0) + \Pr(0 < b \leq 36) \\
&= 1/37 + 36/37 \\
&= 1
\end{align}
We can use similar methods to determine $\Pr(0 < b < 36)$ or
$\Pr(0 \leq b < 36)$.
:::
### Functions of a random variable
Any function of a random variable is also a random variable. So for example, if
$x$ is a random variable, so is $x^2$ or $\ln (x)$ or $\sqrt{x}$. We can derive
the PDF or CDF of a function of a random variable directly from the PDF or CDF
of the original random variable.
We say that $y$ is a ***linear function*** of $x$ if:
$$y = a + bx$$
where $a$ and $b$ are constants.
::: example
**Linear and nonlinear functions in roulette**
The net payout from a \$1 bet on red ($w_{red}$) was earlier defined directly
from the underlying outcome $b$. We can use the indicator function to write it
in a compact form:
$$w_{red} = 2I(b \in Red) - 1$$
We could also define it as a function of the random variable $r$:
$$w_{red} = 2r -1$$
Applying the definitions above, $w_{red}$ can be considered a linear function
of $r$, or a nonlinear function of $b$.
:::
We will have many results below that apply specifically for linear functions,
but not for nonlinear functions.
## The expected value {#the-expected-value}
The ***expected value*** of a random variable $x$ is written $E(x)$. When $x$
is discrete, it is defined as:
$$E(x) = \sum_{a \in S_x} a\Pr(x=a) = \sum_{a \in S_x} af_x(a)$$
The expected value is also called the ***mean***, the ***population mean***
or the ***expectation*** of the random variable.
The formula might look difficult if you are not used to the notation, but it is
actually quite simple to calculate:
1. Figure out the support and PDF of $x$.
2. Multiply each value in the support by the PDF at that value.
3. Add these numbers up.
::: example
**Some expected values in roulette**
The support of $b$ is $\{0,1,2\ldots,36\}$ and its PDF is the $f_b(\cdot)$
function we calculated earlier. So its expected value is:
\begin{align}
E(b) &= 0*\underbrace{f_b(0)}_{1/37} + 1*\underbrace{f_b(1)}_{1/37} + \cdots 36*\underbrace{f_b(36)}_{1/37} \\
&= \frac{1 + 2 + \cdots + 36}{37} \\
&= 18
\end{align}
The support of $r$ is $\{0,1\}$ and its PDF is the $f_r(\cdot)$ function we
calculated earlier. So its expected value is:
\begin{align}
E(r) &= 0*\underbrace{f_r(0)}_{19/37} + 1*\underbrace{f_r(1)}_{18/37} \\
&= 18/37 \\
&\approx 0.486
\end{align}
The support of $w_{14}$ is $\{-1,35\}$ and its PDF is the $f_{14}(\cdot)$
function we calculated earlier. So its expected value is:
\begin{align}
E(w_{14}) &= -1*\underbrace{f_{14}(-1)}_{36/37} + 35*\underbrace{f_{14}(35)}_{1/37} \\
&= 1/37 \\
&\approx -0.027
\end{align}
That is, each dollar bet on 14 leads to an average loss of 2.7 cents for the
bettor.
:::
We can think of the expected value as a weighted average of its possible values,
with each value weighted by the probability of observing that value. It is often
loosely interpreted as a measure of "central tendency" (a typical or
representative value) for the random variable.
### Linearity of expectations
Since the expected value is a sum, it has some of the same properties as sums.
In particular, the associative and distributive rules apply, which means that:
$$E(a + bx) = a + bE(x)$$
That is, we can take the expected value "inside" any linear function. This will
turn out to be a very handy property.
::: example
**The expected value of a linear function in roulette**
Earlier, we showed that $w_{red}$ can be defined as a linear function of $r$:
$$w_{red} = 2r -1$$
so its expected value can be derived:
\begin{align}
E(w_{red}) &= E(2r - 1) \\
&= 2 \underbrace{E(r)}_{18/37} - 1 \\
&= -1/37 \\
&\approx -0.027
\end{align}
We can verify this calculation is correct by deriving the expected value
directly from the PDF:
\begin{align}
E(w_{red}) &= -1*\underbrace{f_{red}(-1)}_{19/37} + 1*\underbrace{f_{red}(1)}_{18/37} \\
&\approx -0.027
\end{align}
That is, each dollar bet on red leads to an average loss of 2.7 cents for the
bettor.
:::
Unfortunately, this handy property applies only to linear functions. If
$g(\cdot)$ is a nonlinear function, than $E(g(x)) \neq g(E(x))$. For example:
$$E(x^2) \neq E(x)^2$$
$$E( 1/x ) \neq 1 / E(x)$$
Students frequently make this mistake, so try to avoid it.
::: example
**The expected value of a nonlinear function in roulette**
We also showed we can define $w_{red}$ as a nonlinear function of $b$:
$$w_{red} = 2 I(b \in Red) - 1$$
Can we take the expected value inside this function? That is, does:
\begin{align}
E(w_{red}) &= 2 I(E(b) \in Red) - 1 \qquad \textrm{?} \\
\end{align}
We already showed that $E(w_{red}) \approx -0.027$. We also showed earlier that
$E(b) = 18$, so we can find:
\begin{align}
2 I(E(b) \in Red) - 1 &= 2 I(18 \in Red) - 1 \\
&= 2*1 - 1 \\
&= 1
\end{align}
Since $-0.027 \neq 1$, it is clear that $E(w_{red}) \neq 2 I(E(b) \in Red)$.
:::
## Quantiles and their relatives {#the-properties-of-a-random-variable}
The expected value is one way of describing something about a random variable,
but there are many others. We will describe a few of the most important ones.
### Range {#range}
The ***range*** of a random variable is the interval from its lowest possible
value $\min(S_x)$ to its highest possible value $\max(S_x)$.
:::example
**The range in roulette**
The support of $w_{red}$ is $\{-1,1\}$ so its range is $[-1,1]$.
The support of $w_{14}$ is $\{-1,35\}$ so its range is $[-1,35]$.
The support of $b$ is $\{0,1,2,\ldots,36\}$ so its range is $[0,36]$.
:::
### Quantiles and percentiles {#quantiles-and-percentiles}
Let $q$ be any number *strictly* between zero and one. Then the $q$
***quantile*** of a random variable $x$ is defined as:
\begin{align}
F_x^{-1}(q) &= \min\{a \in S_X: \Pr(x \leq a) \geq q\} \\
&= \min\{a \in S_x: F_x(a) \geq q\}
\end{align}
where $F_x(\cdot)$ is the CDF of $x$. The quantile function $F_x^{-1}(\cdot)$
is also called the ***inverse CDF***, for reasons that will soon be clear.
The $q$ quantile of a distribution is also called the $100q$ ***percentile***;
for example the 0.25 quantile of $x$ is also called the 25th percentile of $x$.
::: example
**Quantiles in roulette**
The CDF of $w_{red}$ is:
$$F_{red}(a) = \begin{cases}0 & a < -1 \\
19/37 \approx 0.514 & -1 \leq a < 1 \\
1 & a \geq 1 \\ \end{cases}$$
We can plot this CDF in the red line of the graph below.
```{r RouletteQuantiles, fig.cap = "*CDFs for the roulette example*"}
ggplot(data = RouletteCDF, mapping = aes(x = a)) +
geom_step(aes(y=Fred), col = "red") +
xlab("a") +
ylab("F(a)") +
geom_text(x = 20,
y = 0.9,
label = "F_red(a)",
col = "red") +
geom_text(mapping=aes(x = 3, y = 0.25),
label = "0.25 quantile = -1",
col = "blue") +
geom_text(x = 5,
y = 0.75,
label = "0.75 quantile = 1",
col = "blue") +
geom_segment(x=-5,xend=-1,y=0.25,yend=0.25,col="blue",linetype = "dashed") +
geom_segment(x=-5,xend=1,y=0.75,yend=0.75,col="blue",linetype = "dashed") +
labs(title = "Cumulative distribution function (CDF)",
subtitle = "Roulette - net winnings from bet on red",
caption = "",
tag = "")
```
To find any quantile $q$, we can apply the definition, or just need to find the
value on the graph where $F_{red}(\cdot)$ crosses $q$.
For example, the 0.25 quantile (25th percentile) is defined as:
\begin{align}
F_{red}^{-1}(0.25) &= \min\{a \in S_x: F_{red}(a) \geq 0.25\} \\
&= \min \{-1, 1\} \\
&= -1
\end{align}
or we can draw the blue dashed line marked "0.25 quantile" and see that it hits
the red line at $a = -1$.
By the same method, we can find that the 0.75 quantile (75th percentile) by
seeing that the red line crosses the blue dashed line marked "0.75 quantile" at
$a = 1$, or we can apply the definition:
\begin{align}
F_{red}^{-1}(0.75) &= \min\{a \in S_x: F_{red}(a) \geq 0.75\} \\
&= \min \{1\} \\
&= -1
\end{align}
Either method will work.
:::
The formula for the quantile function may look intimidating, but it can be
constructed by just "flipping" the axes of the CDF. This is why the quantile
function is also called the inverse CDF.
::: example
**The whole quantile function**
We can use the same ideas as in the previous example to show that $F^{-1}(q)$ is
equal to $-1$ for any $q$ between $0$ and $19/37$, and equal to $1$ for any $q$
between $19/37$ and $1$. But what is the value of $F^{-1}_{red}(19/37)$? To
figure that out we will need to carefully apply the definition:
\begin{align}
F_{red}^{-1}(19/37) &= \min\{a \in S_x: F_{red}(a) \geq 19/37\} \\
&= \min \{-1,1\} \\
&= -1
\end{align}
So the full quantile function can be written:
\begin{align}
F^{-1}(q) &= \begin{cases}
-1 & 0 < q \leq 19/37 \\
1 & 19/37 < q < 1 \\
\end{cases}
\end{align}
and we can plot it below:
```{r RouletteQuantFunction, fig.cap = "*Quantile function for $w_{red}$*"}
RouletteQuant <- tibble(a = c(0.0001,19/37,0.9999),
qred = c(-1,1,1))
ggplot(data = RouletteQuant, mapping = aes(x = a)) +
geom_step(aes(y=qred),col = "red") +
xlab("quantile") +
ylab("value of w_red at this quantile") +
geom_text(x = 0.8,
y = 0.9,
label = "q_red(a)",
col = "red") +
labs(title = "Quantile function (inverse CDF)",
subtitle = "Roulette - net winnings from bet on red",
caption = "",
tag = "")
```
Notice that this looks just like the CDF, but with the horizontal and vertical
axes flipped.
:::
### Median {#the-median}
The ***median*** of a random variable is its 0.5 quantile or 50th percentile.
::: example
**The median in roulette**
The median of $w_{red}$ is just its 0.5 quantile or 50th percentile:
$$median(w_{red}) = F_{red}^{-1}(0.5) = -1$$
:::
Like the expected value, the median is often loosely interpreted as a measure of
central tendency for the random variable.
## Variance and standard deviation {#variance-and-standard-deviation}
In addition to measures of central tendency such as the expected value and
median, we are also interested in measures of "spread" or variability. We have
already seen one - the range - but there are others, including the variance and
standard deviation.
### Variance {#variance}
The ***variance*** of a random variable $x$ is defined as:
$$\sigma_x^2 = var(x) = E((x-E(x))^2)$$
Variance can be thought of as a measure of how much $x$ tends to deviate from
its central tendency $E(x)$.
::: example
**Calculating variance from the definition**
The variance of $r$ is:
\begin{align}
var(r) &= (0-\underbrace{E(r)}_{18/37})^2 *\frac{19}{37} + (1-\underbrace{E(r)}_{18/37})^2 * \frac{18}{37} \\
&\approx 0.25
\end{align}
The variance of $w_{red}$ is:
\begin{align}
var(w_{red}) &= (-1-\underbrace{E(w_{red})}_{\approx 0.027})^2 * \frac{19}{37} + (1-\underbrace{E(w_{red})}_{\approx 0.027})^2 * \frac{18}{37} \\
&\approx 1.0
\end{align}
The variance of $w_{14}$ is:
\begin{align}
var(w_{14}) &= (-1-\underbrace{E(w_{14})}_{\approx 0.027})^2 * \frac{36}{37} + (35-\underbrace{E(w_{14})}_{\approx 0.027})^2 * \frac{1}{37} \\
&\approx 34.1
\end{align}
Notice that a bet on 14 has the same expected payout as a bet on red:
$$E(w_{14}) = E(w_{red}) = -0.027$$
but its payout is much more variable:
$$var(w_{14}) = 34.1 > 1.0 = var(w_{red})$$
:::
The key to understanding the variance is that it is the expected value of
a square $(x-E(x))^2$, and the expected value is just a (weighted) sum.
This has several implications:
1. The variance is always positive (or more precisely, non-negative):
$$var(x) \geq 0$$
The intuition is straightforward. All squares are positive, and the expected
value is just a sum. If you add up several positive numbers, you will get
a positive number.
2. The variance can also be written in the form:
$$var(x) = E(x^2) - E(x)^2$$
The derivation of this is as follows:
\begin{align}
var(x) &= E((x-E(x))^2) \\
&= E( ( x-E(x) ) * (x - E(x) )) \\
&= E( x^2 - 2xE(x) + E(x)^2) \\
&= E(x^2) - 2E(x)E(x) + E(x)^2 \\
&= E(x^2) - E(x)^2
\end{align}
This formula is often an easier way of calculating the variance.
::: example
**Calculating variance using the alternate formula**
We already found that $E(w_{14}) = -0.027$, so we can calculate $var(w_{14})$ by
finding:
\begin{align}
E(w_{14}^2) &= (-1)^2 f_{14}(-1) + 35^2 f_{14}(35) \\
&= 1 * \frac{36}{37} + 1225 * \frac{1}{37} \\
&\approx 34.08
\end{align}
Putting these results together we get:
\begin{align}
var(w_{14}) &= E(w_{14}^2) - E(w_{14})^2 \\
&\approx 34.08 + (-0.027)^2 \\
&\approx 34.1
\end{align}
which is the same result as we found earlier.
:::
3. We can also find the variance of any linear function of a random variable. For
any constants $a$ and $b$:
$$var(a + bx) = b^2 var(x)$$
This can be derived as follows:
\begin{align}
var(a+bx) &= E( ( (a+bx) - E(a+bx))^2) \\
&= E( ( a+bx - a-bE(x))^2) \\
&= E( (b(x - E(x)))^2) \\
&= E( b^2(x - E(x))^2) \\
&= b^2 E( (x - E(x))^2) \\
&= b^2 var(x)
\end{align}
::: example
**Calculating the variance of a linear function**
We earlier found that $var(r) \approx 0.25$, so we can find the variance of
$w_{red}$ using our formula for the variance of a linear function:
\begin{align}
var(w_{red}) &= var( 2r - 1) \\
&= 2^2 var(r) \\
&\approx 4*0.25 \\
&\approx 1.0
\end{align}
which is the same result as we found earlier.
:::
### Standard deviation {#standard-deviation}
The ***standard deviation*** of a random variable is defined as the (positive)
square root of its variance:
$$\sigma_x = sd(x) = \sqrt{var(x)}$$
The standard deviation is just another way of describing the variability of $x$.
In some sense, the variance and standard deviation are interchangeable since
they are so closely related. The standard deviation has the advantage that it
is expressed in the same units as the underlying random variable, while the
variance is expressed in the square of those units. This makes the standard
deviation somewhat easier to interpret.
::: example
**Standard deviation in roulette**
The standard deviation of $r$ is:
$$sd(r) = \sqrt{var(r)} \approx \sqrt{0.25} \approx 0.5$$
The standard deviation of $w_{red}$ is:
$$sd(w_{red}) = \sqrt{var(w_{red})} \approx \sqrt{1.0} \approx 1.0$$
The standard deviation of $w_{14}$ is
$$sd(w_{14}) = \sqrt{var(w_{14})} \approx \sqrt{34.1} \approx 5.8$$
:::
The standard deviation has analogous properties to the variance:
1. It is always non-negative:
$$sd(x) \geq 0$$
2. For any constants $a$ and $b$:
$$sd(a + bx) = b \, sd(x)$$
These properties follow directly from the corresponding properties of the
variance.
### Standardization {#standardization}
It is sometimes useful to ***standardize*** a random variable. This means
constructing a new random variable of the form:
$$z = \frac{x - E(x)}{sd(x)}$$
By construction, the standardized random variable $z$ has expected value
$E(z) = 0$ and and variance/standard deviation $var(z) = sd(z) = 1$.
Standardization is commonly used in fields like psychology or educational
testing when there is no natural unit of measurement.
::: example
**A standardized test score**
Suppose that the ECON 233 exam is graded on a scale from 0 to 100, with a
mean score of $E(x) = 70$ and a standard deviation of $sd(x) = 10$. For any
individual student's score $x$, the standardized score is:
$$ z = \frac{x-70}{10} $$
or (equivalently):
$$z = 0.1 x - 7 $$
Applying our results on linear functions of a random variable:
\begin{align}
E(z) &= 0.1 E(x) - 7 \\
&= 0.1 \times 70 - 7 \\
&= 0 \\
sd(z) &= 0.1 sd(x) \\
&= 0.1 \times 10 \\
&= 1
\end{align}
Students with a positive standardized test score $(z > 0)$ did better than
average, while students with a negative standardized score did worse than
average.
:::
## Standard discrete distributions {#standard-distributions}
In principle, there are an infinite number of possible probability
distributions. However, some probability distributions appear so often in
applications that we have given them names. This provides a quick way to
describe a particular distribution without writing out its full PDF, using the
notation:
$$RandomVariable \sim DistributionName(Parameters)$$
where $RandomVariable$ is the name of the random variable whose distribution is
being described, the $\sim$ character can be read as "has the following
probability distribution", $DistributionName$ is the name of the probability
distribution, and $Parameters$ is a list of arguments called ***parameters***
that provide additional information about the probability distribution.
Using a standard distribution also allows us to establish the properties of a
commonly-used distribution once, and use those results every time we use that
distribution. In this section we will describe three standard distributions -
the Bernoulli, the binomial, and the discrete uniform - and their properties.
### Bernoulli {#bernoulli}
The ***Bernoulli*** probability distribution is usually written:
$$x \sim Bernoulli(p)$$
It has discrete support $S_x = \{0,1\}$ and PDF:
\begin{align}
f_x(a) &= \begin{cases}
(1-p) & \textrm{if $a = 0$} \\
p & \textrm{if $a = 1$} \\
0 & \textrm{otherwise}\\
\end{cases}
\end{align}
Note that the "Bernoulli distribution" isn't really a (single) probability
distribution. Instead it is what we call a ***parametric family*** of
distributions. That is, the $Bernoulli(p)$ is a different distribution with a
different PDF for each value of the ***parameter*** $p$.
We typically use Bernoulli random variables to model the probability of some
random event $A$. If we define $x$ as the indicator variable $x=I(A)$, then
$x \sim Bernoulli(p)$ where $p=\Pr(A)$.
::: example
**The Bernoulli distribution in roulette**
The variable $r = I(Red)$ has the $Bernoulli(18/37)$ distribution.
:::
The mean of a $Bernoulli(p)$ random variable is:
\begin{align}
E(x) &= (1-p)*0 + p*1 \\
&= p
\end{align}
and its variance is:
\begin{align}
var(x) &= E(x^2) - E(x)^2 \\
&= (0^2*(1-p) + 1^2 p) - (p)^2 \\
&= p - p^2
\end{align}
### Binomial {#binomial}
The ***binomial*** probability distribution is usually written:
$$x \sim Binomial(n,p)$$
It has discrete support $S_x = \{0,1,2,\ldots,n\}$ and its PDF is:
$$f_x(a) =
\begin{cases}
\frac{n!}{a!(n-a)!} p^a(1-p)^{n-a} & \textrm{if $a \in S_x$} \\
0 & \textrm{otherwise} \\
\end{cases}$$
You do not need to memorize or even understand this formula. The Excel function
`BINOMDIST()` can be used to calculate the PDF or CDF of the binomial
distribution, and the function `BINOM.INV()` can be used to calculate its
quantiles.
The binomial distribution is typically used to model frequencies or counts. We
can show that it is the distribution of how many times a probability-$p$ event
happens in $n$ independent attempts.
For example, the basketball player Stephen Curry makes about 43\% of his
3-point shot attempts. If each shot is independent of the others, then the
number of shots he makes in 10 attempts will have the $Binomial(10,0.43)$
distribution.
::: example
**The binomial distribution in roulette**
Suppose we play 50 (independent) games of roulette, and bet on red in every
game. Since the outcome of a single bet on red is $r \sim Bernoulli(18/37)$,
the number of times we win is:
$$WIN50 \sim Binomial(50,18/37)$$
We can use the Excel formula `=BINOM.DIST(25,50,18/37,FALSE)` to calculate the
probability of winning exactly 25 times:
$$\Pr(WIN50 = 25) \approx 0.11$$
and we can use the Excel formula `= 1 - BINOM.DIST(25,50,18/37,TRUE)` to
calculate the probability of winning more than 25 times:
$$\Pr(WIN50 > 25) = 1 - \Pr(WIN50 \leq 25) \approx 0.37$$
So we have a 37\% chance of making money (winning more often than losing),
an 11\% chance of breaking even, and a 52\% chance of losing money.
:::