04-Random-variables.Rmd

# Random variables {#random-variables}

```{r setup4, include=FALSE}
knitr::opts_chunk$set(echo = FALSE,
                      prompt = FALSE,
                      tidy = TRUE,
                      collapse = TRUE)
library("tidyverse")
```

Economics is a mostly *quantitative* field: its outcomes can be usually
described using numbers: price, quantity, interest rates, unemployment rates,
GDP, etc. Statistics is also a quantitative field: every statistic is a number
calculated from data.  When a random outcome is described by a number, we call
that number a "random variable."  We can use probability theory to describe and
model random variables.

This chapter will introduce the basic terminology and mathematical tools for
working with simple random variables.

::: {.goals data-latex=""}
**Chapter goals**

In this chapter, we will learn how to:

1.  Define a random variable in terms of a random outcome.
2.  Determine the support and range of a random variable.
3.  Calculate and interpret the PDF of a discrete random variable.
4.  Calculate and interpret the CDF of a discrete random variable.
5.  Calculate interval probabilities from the CDF.
6.  Calculate the expected value of a discrete random variable from its PDF.
7.  Calculate a quantile from the CDF.
8.  Calculate the variance of a discrete random variable from its PDF.
9.  Calculate the variance from expected values.
9.  Calculate the standard deviation from the variance.
10. Calculate the expected value for a linear function of a random variable.
11. Calculate the variance and standard deviation for a linear function of a
    random variable.
12. Standardize a random variable.
13. Use standard discrete probability distributions:
    - Bernoulli
    - binomial
    - discrete uniform.
:::

To prepare for this chapter, please review the chapter on
[probability and random events](#probability) and the section on 
[sequences and summations](#sequences-and-summations) in the math appendix.

## Defining a random variable {#introduction-to-random-variables}

A ***random variable*** is a number whose value depends on a random outcome. The
idea here is that we are going to use a random variable to describe some (but
not necessarily every) aspect of the outcome.

::: example
**Random variables in roulette**

Here are a few random variables we could define in a roulette game:

- The original outcome $b$
- An indicator for whether a bet on red wins:
  $$r = I(b \in Red)=\begin{cases}1 & b \in Red\\ 0 & b \notin Red \\ \end{cases}$$
- The net payout from a \$1 bet on red:
      $$ w_{red} = w_{red}(b) = \begin{cases}  1 & \textrm{ if } b \in Red \\ -1 & \textrm{ if } b \in Red^c \end{cases} $$
      That is, a player who bets \$1 on red wins \$1 if the ball lands on red 
      and loses \$1 if the ball lands anywhere else.
- The net payout from a \$1 bet on 14:
      $$ w_{14} = w_{14}(b) = \begin{cases}  35 & \textrm{ if } b = 14 \\ -1 & \textrm{ if } b \neq 14 \end{cases} $$
      That is, a player who bets \$1 on 14 wins \$35 if the ball lands on 14 
      and loses \$1 if the ball lands anywhere else.

All of these random variables are defined in terms of the underlying outcome.
:::

A random variable is always a function of the original outcome, but for
convenience, we usually leave its dependence on the original outcome implicit,
and write it as if it were an ordinary variable.

### Implied distribution {#probability-distributions}

Since every random variable is a number, we can define its sample space as the
set of real numbers $\mathbb{R}$.

Each random variable has its own probability distribution over this sample space
and this probability distribution can be derived from the probability
distribution of the underlying outcome.  That is, let
$\omega \in \Omega$ be some random outcome, and let $x = x(\omega)$
be some random variable that depends on that outcome.  Then the probability
that $x$ is in some set $A$ is:
$$\Pr(x \in A) = \Pr(\{\omega \in \Omega: x(\omega) \in A\})$$
Again, this definition looks complicated but is easier to follow with a few
simple examples.

::: example
**Probability distributions for roulette**

Assuming we have a fair roulette game:

   - We already know that the probability distribution for $b$ is:
      $$\Pr(b = 0) = 1/37 \approx 0.027$$
      $$\Pr(b = 1) = 1/37 \approx 0.027$$
      $$\vdots$$
      $$\Pr(b = 36) = 1/37 \approx 0.027$$
      $$\Pr(b \notin \{0,1,\ldots,36\}) = 0$$
   - The probability distribution for $w_{red}$ is:
      $$\Pr(w_{red} = 1) = \Pr(b \in Red) = 18/37 \approx 0.486$$
      $$\Pr(w_{red} = -1) = \Pr(b \notin Red) = 19/37 \approx 0.514$$
      $$\Pr(w_{red} \notin \{-1,1\}) = 0$$
   - The probability distribution for $w_{14}$ is:
      $$\Pr(w_{14} = 35) = \Pr(b = 14) = 1/37 \approx 0.027$$
      $$\Pr(w_{14} = -1) = \Pr(b \neq 14) = 36/37 \approx 0.973$$
      $$\Pr(w_{14} \notin \{-1,35\}) = 0$$

Notice that these random variables are related to each other since they all
depend on the same underlying outcome. Section \@ref(multiple-random-variables)
will explain how we can describe and analyze those relationships.
:::

### The support {#the-support}

The ***support*** of a random variable $x$ is the smallest[^401] set
$S_x \subset \mathbb{R}$ such that $\Pr(x \in S_x) = 1$.

[^401]: Technically, it is the smallest *closed* set, but let's ignore that.

In plain language, the support is the set of all values in the sample space that
have some chance of actually happening.

::: example
**The support in roulette**

The sample space of $b$ is $\mathbb{R}$ and the support of $b$ is
$S_{b} = \{0,1,2,\ldots,36\}$.

The sample space of $w_{Red}$ is $\mathbb{R}$ and the support of
$w_{Red}$ is $S_{Red} = \{-1,1\}$.

The sample space of $w_{14}$ is $\mathbb{R}$ and the support of
$w_{14}$ is $S_{14} = \{-1,35\}$.
:::

The random variables we will consider in this chapter  have  ***discrete***
support. That is, the support is a set of isolated points each of which has
a strictly positive probability. In most examples the support will also have
a ***finite*** number of elements.  All finite sets are also discrete, but
it is also possible for a discrete set to have an infinite number of elements.
For example, the set of positive integers $\{1,2,3,\ldots\}$ is both discrete
and infinite.

Some random variables have a support that is continuous rather than discrete.
Chapter \@ref(more-on-random-variables) will cover continuous random variables. 

### The PDF {#the-pdf-of-a-discrete-random-variable}

We can describe the probability distribution of a random variable with a
function called its ***probability density function (PDF)***.

The PDF of a discrete random variable is defined as:
  $$f_x(a) = \Pr(x = a)$$
where $a$ is any number.  By convention, we typically use a lower-case $f$ to
represent a PDF, and we use the subscript when needed to clarify which specific
random variable we are talking about.

In some cases we are just given the PDF, in others we may need to calculate it
using the tools we have already learned.

::: example
**The PDF in roulette**

Our three random variables are all discrete, and each has its own PDF:

  $$f_b(a) = \Pr(b = a) = \begin{cases}
            1/37 & a \in \{0,1,\ldots,36\} \\
            0 & a \notin \{0,1,\ldots,36\} \\
          \end{cases}$$
  $$f_{red}(a) = \Pr(w_{red} = a) = \begin{cases}
            19/37 & a = -1 \\
            18/37 & a = 1 \\
            0 & a \notin \{-1,1\} \\
          \end{cases}$$
  $$f_{14}(a) = \Pr(w_{14} = a) = \begin{cases}
            36/37 & a = -1 \\
            1/37 & a = 35 \\
            0 & a \notin \{-1,35\} \\
          \end{cases}$$

Figure \@ref(fig:RoulettePDF) below shows these three PDFs.

```{r RoulettePDF, fig.cap = "*PDFs for the roulette example*"}
RoulettePDF <- tibble(a = seq(from = -2, to= 36),
                      fb = c(0, 0, rep(1/37, times = 37)),
                      fred = c(0, 19/37, 0, 18/37, rep(0, times=35)),
                      f14 = c(0, 36/37,rep(0, times=35), 1/37, 0))
ggplot(data = RoulettePDF, mapping = aes(x = a)) +
  geom_point(aes(y=fb), col = "blue") +
  geom_point(aes(y=fred), col = "red") +
  geom_point(aes(y=f14), col = "orange") +
  xlab("a") +
  ylab("f(a)") +
  ylim(0,1) +
  geom_text(x = 4, 
            y = 4/37, 
            label = "f_b(a)", 
            col = "blue") +
  geom_text(x = 4, 
            y = 18/37, 
            label = "f_red(a)", 
            col = "red") +
  geom_text(x = 2, 
            y = 36/37, 
            label = "f_14(a)", 
            col = "orange") +
  labs(title = "Probability density function (PDF)", 
       subtitle = "Roulette", 
       caption = "", 
       tag = "")
```

Note that the points overlap, so you may not be able to see each value for a
given PDF.
:::

The PDF provides a complete description of the probability distribution of a
random variable.  That is, for any random variable $x$ and any event
$A \subset \mathbb{R}$ we can calculate $\Pr(x \in A)$ by simply adding up the
corresponding PDF of $x$:
  \begin{align}
    \Pr(x \in A) &= \sum_{s \in A} \Pr(x = s) \\
                 &= \sum_{s \in S_x} f_x(s)I(s \in A)
  \end{align}
If you are unfamiliar with the notation here, please refer to the sections on
[summations](#summations) and [the indicator function](#the-indicator-function)
in the Math Review Appendix. The formula is easy to use once you understand the
notation.

::: example
**Some event probabilities in roulette**

Since the outcome in roulette is discrete, we can calculate any event
probability by adding up the probabilities of the event's component outcomes.

The probability of the event $b \leq 1$ can be calculated:
  \begin{align}
    \Pr(b \leq 1) &= \sum_{s=0}^{36}f_x(s)I(s \leq 1) \\
      &= \underbrace{f_b(0)}_{1/37} \underbrace{I(0 \leq 1)}_{1} +
          \underbrace{f_b(1)}_{1/37} \underbrace{I(1 \leq 1)}_{1} +
          \underbrace{f_b(2)}_{1/37} \underbrace{I(2 \leq 1)}_{0} +
          \cdots +
          \underbrace{f_b(36)}_{1/37} \underbrace{I(36 \leq 1)}_{0} \\
      &= 2/37
  \end{align}
  
The probability of the event $b \in Even$ can be calculated:
  \begin{align}
    \Pr(b \in Even) &= \sum_{s=0}^{36}f_x(s)I(s \in Even) \\
      &= \underbrace{f_b(0)}_{1/37} \underbrace{I(0 \in Even)}_{0} +
          \underbrace{f_b(1)}_{1/37} \underbrace{I(1 \in Even)}_{0} +
          \underbrace{f_b(2)}_{1/37} \underbrace{I(2 \in Even)}_{1} +
          \cdots +
          \underbrace{f_b(36)}_{1/37} \underbrace{I(36 \in Even)}_{1} \\
        &= 18/37
  \end{align}
Remember that zero is not counted as an even number in roulette, so it is not
in the event $Even$.
:::

The PDF of a discrete random variable has several general properties:

1. It is always between zero and one:
    $$0 \leq f_x(a) \leq 1$$
   since it is a probability.
2. It sums up to one over the support:
    $$\sum_{a \in S_x} f_x(a) = \Pr(x \in S_x) = 1$$
   since the support has probability one by definition.
3. It is strictly positive for all values in the support:
    $$a \in S_x \implies f_x(a) > 0$$
   since the support is the *smallest* set that has probability one.

You can confirm that examples above all satisfy these properties.

### The CDF {#the-cdf}

Another way to describe the probability distribution of a random variable is
with a function called its ***cumulative distribution function (CDF)***. 
The CDF is a little less intuitive than the PDF, but it has the advantage that
it always has the same definition whether the random variable is discrete,
continuous, or even some combination of the two.

The CDF of the random variable $x$ is the function 
$F_x:\mathbb{R} \rightarrow [0,1]$ defined by:
  $$F_x(a) = Pr(x \leq a)$$
where $a$ is any number.  By convention, we typically use an upper-case $F$ to
indicate a CDF, and we use the subscript to indicate what random variable we are
talking about.

We can construct the CDF of a discrete random variable by just adding up the
PDF:
  \begin{align}
    F_x(a) &= \Pr(x \leq a) \\
      &= \sum_{s \in S_x} f_x(s)I(s \leq a)
  \end{align}
This formula leads to a "stair-step" appearance: the CDF is flat for all values
outside of the support, and then jumps up at all values in the support.

::: example
**CDFs for roulette**

  - The CDF of $b$ is:
     $$F_b(a) = \begin{cases}
                   0 & a < 0 \\
                   1/37 & 0 \leq a < 1 \\
                   2/37 & 1 \leq a < 2 \\
                   \vdots & \vdots \\
                   36/37 & 35 \leq a < 36 \\
                   1 & a \geq 36 \\
                \end{cases}$$
  - The CDF of $w_{red}$ is:
    $$F_{red}(a) = \begin{cases}
                0 & a < -1 \\
                19/37 & -1 \leq a < 1 \\
                1 & a \geq 1 \\
                \end{cases}$$
  - The CDF of $w_{14}$ is:
    $$F_{14}(a) = \begin{cases}
                0 & a < -1 \\
                36/37 & -1 \leq a < 35 \\
                1 & a \geq 35 \\
                \end{cases}$$
:::


The CDF has several properties:

1. The CDF is a *probability*, just like the PDF.  For any number $a$ we know
   that:
     $$0 \leq F_x(a) \leq 1$$
2. The CDF is *non-decreasing*.  For any two numbers $a$ and $b$ so that
   $a \leq b$, we know that:
     $$F_x(a) \leq F_x(b)$$
3. The CDF  *runs from zero to one*.  That is, it is zero or close to zero for
   low values of $a$, and one or close to one for high values of $a$. We can use
   limits to give precise meaning to the broad terms "close", "low", and "high":
    $$\lim_{a \rightarrow -\infty} F_x(a) = \Pr(x \leq -\infty) = 0$$
    $$\lim_{a \rightarrow \infty} F_x(a) = \Pr(x \leq \infty) = 1$$
   You can review the section on [limits](#limits) in the math appendix if you
   do not follow the notation.

::: example
**CDF properties**

Figure \@ref(fig:RouletteCDF) below graphs the CDFs from the previous example:  

```{r RouletteCDF, fig.cap = "*CDFs for the roulette example*"}
RouletteCDF <- RoulettePDF %>%
    mutate (Fb = cumsum(fb)) %>%
    mutate (Fred = cumsum(fred))%>%
    mutate (F14 = cumsum(f14)) 
ggplot(data = RouletteCDF, mapping = aes(x = a)) +
  geom_step(aes(y=Fb), col = "blue") +
  geom_step(aes(y=Fred), col = "red") +
  geom_step(aes(y=F14), col = "orange") +
  xlab("a") +
  ylab("F(a)") +
  geom_text(x = 7, 
            y = 4/37, 
            label = "F_b(a)",
            col = "blue") +
  geom_text(x = 5, 
            y = 18/37, 
            label = "F_red(a)",
            col = "red") +
  geom_text(x = 8, 
            y = 32/37, 
            label = "F_14(a)",
            col = "orange") +
  labs(title = "Cumulative distribution function (CDF)", 
       subtitle = "Roulette", 
       caption = "", 
       tag = "")
```

Notice that they show all of the general properties described above:

  - The CDF never goes down, only goes up or stays the same.
  - The CDF runs from zero to one, and never leaves that range.

In addition, all of these CDFs have a distinctive "stair-step" shape, jumping up
at each point in $S_x$ and flat between those points,  This is a general
property of CDFs for discrete random variables.
:::

In addition to constructing the CDF from the PDF, we can also go the other way,
and construct the PDF of a discrete random variable from its CDF. Each little
jump in the CDF is a point in the support, and the size of the jump is exactly
equal to the PDF.

::: {.fyi data-latex=""}
In more formal mathematics, the formula for deriving the PDF of a discrete
random variable from its CDF would be written:

  $$f_x(a) = F_x(a) - \lim_{\epsilon \rightarrow 0} F_x(a-|\epsilon|)$$
but we can just think of it as the size of the jump.
:::

### Interval probabilities

Finally, we can use the CDF to calculate the probability that $x$ lies in any
interval.  That is, let $a$ and $b$ be any two numbers such that $a < b$. Then:
  \begin{align}
    F(b) - F(a) &= \Pr(x \leq b) - \Pr(x \leq a) \\
      &= \Pr((x \leq a) \cup (a < x \leq b)) - \Pr(x \leq a) \\
      &= \Pr(x \leq a) + \Pr(a < x \leq b) - \Pr(x \leq a) \\
      &= \Pr(a < x \leq b) 
  \end{align}
Notice that we have to be a little careful here to distinguish between the
strict inequality $<$ and the weak inequality $\leq$, because it is always
possible for $x$ to be exactly equal to $a$ or $b$.

::: example
**Calculating interval probabilities**

Consider the CDF for $b$ derived above. Then:
  \begin{align}
    \Pr(b \leq 36) &= F_b(36) \\
      &= 1 \\
    \Pr(0 < b \leq 36) &= F_b(36) - F_b(0) \\
      &= 1 - 1/37 \\
      &= 36/37
  \end{align}
Note that the placement of the $<$ and $\leq$ are important here.

What if we want $\Pr(0 \leq b \leq 36)$ instead?  We can split that event into
two disjoint events $(b = 0)$ and $(0 < b \leq 36)$ and apply the axioms of
probability:
  \begin{align}
    \Pr(0 \leq b \leq 36) &= \Pr( (b = 0) \cup (0 < b \leq 36) )  \\
      &= \Pr(b = 0) + \Pr(0 < b \leq 36)  \\
      &= 1/37 + 36/37  \\
      &= 1
  \end{align}
We can use similar methods to determine $\Pr(0 < b < 36)$ or
$\Pr(0 \leq b < 36)$.
:::

### Functions of a random variable

Any function of a random variable is also a random variable.  So for example, if
$x$ is a random variable, so is $x^2$ or $\ln (x)$ or $\sqrt{x}$.  We can derive
the PDF or CDF of a function of a random variable directly from the PDF or CDF
of the original random variable.

We say that $y$ is a ***linear function*** of $x$ if:
  $$y = a + bx$$
where $a$ and $b$ are constants.

::: example
**Linear and nonlinear functions in roulette**

The net payout from a \$1 bet on red ($w_{red}$) was earlier defined directly
from the underlying outcome $b$. We can use the indicator function to write it
in a compact form:
  $$w_{red} = 2I(b \in Red) - 1$$
We could also define it as a function of the random variable $r$:
  $$w_{red} = 2r -1$$
Applying the definitions above, $w_{red}$ can be considered a linear function
of $r$, or a nonlinear function of $b$.
:::

We will have many results below that apply specifically for linear functions,
but not for nonlinear functions.

## The expected value {#the-expected-value}

The ***expected value*** of a random variable $x$ is written $E(x)$. When $x$
is discrete, it is defined as:

$$E(x) = \sum_{a \in S_x} a\Pr(x=a) = \sum_{a \in S_x} af_x(a)$$
The expected value is also called the ***mean***, the ***population mean***
or the ***expectation*** of the random variable.

The formula might look difficult if you are not used to the notation, but it is
actually quite simple to calculate:

1. Figure out the support and PDF of $x$.
2. Multiply each value in the support by the PDF at that value.
3. Add these numbers up.

::: example
**Some expected values in roulette**

The support of $b$ is $\{0,1,2\ldots,36\}$ and its PDF is the $f_b(\cdot)$
function we calculated earlier. So its expected value is:
  \begin{align}
    E(b) &= 0*\underbrace{f_b(0)}_{1/37} + 1*\underbrace{f_b(1)}_{1/37} + \cdots 36*\underbrace{f_b(36)}_{1/37} \\
      &= \frac{1 + 2 + \cdots + 36}{37} \\
      &= 18
  \end{align}

The support of $r$ is $\{0,1\}$ and its PDF is the $f_r(\cdot)$ function we
calculated earlier. So its expected value is:
  \begin{align}
    E(r) &= 0*\underbrace{f_r(0)}_{19/37} + 1*\underbrace{f_r(1)}_{18/37} \\
      &= 18/37 \\
      &\approx 0.486
  \end{align}
  
The support of  $w_{14}$ is $\{-1,35\}$ and its PDF is the $f_{14}(\cdot)$
function we calculated earlier. So its expected value is:
    \begin{align}
      E(w_{14}) &= -1*\underbrace{f_{14}(-1)}_{36/37} + 35*\underbrace{f_{14}(35)}_{1/37} \\
      &= 1/37 \\
      &\approx -0.027
    \end{align}
That is, each dollar bet on 14 leads to an average loss of 2.7 cents for the
bettor.
:::

We can think of the expected value as a weighted average of its possible values,
with each value weighted by the probability of observing that value. It is often
loosely interpreted as a measure of "central tendency" (a typical or
representative value) for the random variable.

### Linearity of expectations

Since the expected value is a sum, it has some of the same properties as sums.
In particular, the associative and distributive rules apply, which means that:
  $$E(a + bx) =  a + bE(x)$$
That is, we can take the expected value "inside" any linear function. This will
turn out to be a very handy property.

::: example
**The expected value of a linear function in roulette**

Earlier, we showed that $w_{red}$ can be defined as a linear function of $r$:
  $$w_{red} = 2r -1$$
so its expected value can be derived:
  \begin{align}
    E(w_{red}) &= E(2r - 1) \\
      &= 2 \underbrace{E(r)}_{18/37} - 1 \\
      &= -1/37 \\
      &\approx -0.027
  \end{align}
We can verify this calculation is correct by deriving the expected value
directly from the PDF:
    \begin{align}
      E(w_{red}) &= -1*\underbrace{f_{red}(-1)}_{19/37} + 1*\underbrace{f_{red}(1)}_{18/37} \\
        &\approx -0.027
    \end{align}
That is, each dollar bet on red leads to an average loss of 2.7 cents for the
bettor.
:::

Unfortunately, this handy property applies only to linear functions. If
$g(\cdot)$ is a nonlinear function, than $E(g(x)) \neq g(E(x))$. For example:
  $$E(x^2) \neq E(x)^2$$
  $$E( 1/x ) \neq 1 / E(x)$$
Students frequently make this mistake, so try to avoid it.

::: example
**The expected value of a nonlinear function in roulette**

We also showed we can define $w_{red}$ as a nonlinear function of $b$:
  $$w_{red} = 2 I(b \in Red) - 1$$
Can we take the expected value inside this function?  That is, does:
  \begin{align}
    E(w_{red}) &= 2 I(E(b) \in Red) - 1 \qquad \textrm{?} \\
  \end{align}
We already showed that $E(w_{red}) \approx -0.027$. We also showed earlier that
$E(b) = 18$, so we can find:
  \begin{align}
    2 I(E(b) \in Red) - 1 &= 2 I(18 \in Red) - 1 \\
      &= 2*1 - 1 \\
      &= 1
  \end{align}
Since $-0.027 \neq 1$, it is clear that $E(w_{red}) \neq 2 I(E(b) \in Red)$.
:::

## Quantiles and their relatives {#the-properties-of-a-random-variable}

The expected value is one way of describing something about a random variable,
but there are many others.  We will describe a few of the most important ones.

### Range {#range}

The ***range*** of a random variable is the interval from its lowest possible
value $\min(S_x)$ to its highest possible value $\max(S_x)$.

:::example
**The range in roulette**

The support of $w_{red}$ is $\{-1,1\}$ so its range is $[-1,1]$.

The support of $w_{14}$ is $\{-1,35\}$ so its range is $[-1,35]$.

The support of $b$ is $\{0,1,2,\ldots,36\}$ so its range is $[0,36]$.
:::

### Quantiles and percentiles {#quantiles-and-percentiles}

Let $q$ be any number *strictly* between zero and one.  Then the $q$
***quantile*** of a random variable $x$ is defined as:
  \begin{align}
    F_x^{-1}(q) &= \min\{a \in S_X: \Pr(x \leq a) \geq q\} \\
      &= \min\{a \in S_x: F_x(a) \geq q\}
  \end{align}
where $F_x(\cdot)$ is the CDF of $x$.  The quantile function $F_x^{-1}(\cdot)$
is also called the ***inverse CDF***, for reasons that will soon be clear.

The $q$ quantile of a distribution is also called the $100q$ ***percentile***;
for example the 0.25 quantile of $x$ is also called the 25th percentile of $x$.

::: example
**Quantiles in roulette**

The CDF of $w_{red}$ is:
  $$F_{red}(a) = \begin{cases}0 & a < -1 \\
                              19/37 \approx 0.514 & -1 \leq a < 1 \\
                              1 & a \geq 1 \\ \end{cases}$$
We can plot this CDF in the red line of the graph below.

```{r RouletteQuantiles, fig.cap = "*CDFs for the roulette example*"}
ggplot(data = RouletteCDF, mapping = aes(x = a)) +
  geom_step(aes(y=Fred), col = "red") +
  xlab("a") +
  ylab("F(a)") +
  geom_text(x = 20, 
            y = 0.9, 
            label = "F_red(a)", 
            col = "red") +
  geom_text(mapping=aes(x = 3, y = 0.25),
            label = "0.25 quantile = -1",
            col = "blue") +
  geom_text(x = 5, 
            y = 0.75, 
            label = "0.75 quantile = 1",
            col = "blue") +
  geom_segment(x=-5,xend=-1,y=0.25,yend=0.25,col="blue",linetype = "dashed") +
  geom_segment(x=-5,xend=1,y=0.75,yend=0.75,col="blue",linetype = "dashed") +
  labs(title = "Cumulative distribution function (CDF)", 
       subtitle = "Roulette - net winnings from bet on red", 
       caption = "", 
       tag = "")
```
To find any quantile $q$, we can apply the definition, or just need to find the
value on the graph where $F_{red}(\cdot)$ crosses $q$.

For example, the 0.25 quantile (25th percentile) is defined as:
  \begin{align}
    F_{red}^{-1}(0.25) &= \min\{a \in S_x: F_{red}(a) \geq 0.25\} \\
      &= \min \{-1, 1\} \\
      &= -1
  \end{align}
or we can draw the blue dashed line marked "0.25 quantile" and see that it hits
the red line at $a = -1$.

By the same method, we can find that the 0.75 quantile (75th percentile) by
seeing that the red line crosses the blue dashed line marked "0.75 quantile" at
$a = 1$, or we can apply the definition:
  \begin{align}
    F_{red}^{-1}(0.75) &= \min\{a \in S_x: F_{red}(a) \geq 0.75\} \\
      &= \min \{1\} \\
      &= -1
  \end{align}
Either method will work.
:::

The formula for the quantile function may look intimidating, but it can be
constructed by just "flipping" the axes of the CDF.  This is why the quantile
function is also called the inverse CDF.

::: example
**The whole quantile function**

We can use the same ideas as in the previous example to show that $F^{-1}(q)$ is
equal to $-1$ for any $q$ between $0$ and $19/37$, and equal to $1$ for any $q$
between $19/37$ and $1$. But what is the value of $F^{-1}_{red}(19/37)$?  To
figure that out we will need to carefully apply the definition:
  \begin{align}
    F_{red}^{-1}(19/37) &= \min\{a \in S_x: F_{red}(a) \geq 19/37\} \\
      &= \min \{-1,1\} \\
      &= -1
  \end{align}
So the full quantile function can be written:
  \begin{align}
    F^{-1}(q) &= \begin{cases}
                    -1 & 0 < q \leq 19/37 \\
                    1 & 19/37 < q < 1 \\
                  \end{cases}
  \end{align}
and we can plot it below:  
```{r RouletteQuantFunction, fig.cap = "*Quantile function for $w_{red}$*"}
RouletteQuant <- tibble(a = c(0.0001,19/37,0.9999),
                        qred = c(-1,1,1))
ggplot(data = RouletteQuant, mapping = aes(x = a)) +
  geom_step(aes(y=qred),col = "red") +
  xlab("quantile") +
  ylab("value of w_red at this quantile") +
  geom_text(x = 0.8, 
            y = 0.9, 
            label = "q_red(a)", 
            col = "red") +
  labs(title = "Quantile function (inverse CDF)", 
       subtitle = "Roulette - net winnings from bet on red", 
       caption = "", 
       tag = "")
```
Notice that this looks just like the CDF, but with the horizontal and vertical
axes flipped.
:::

### Median {#the-median}

The ***median*** of a random variable is its 0.5 quantile or 50th percentile.

::: example
**The median in roulette**

The median of $w_{red}$ is just its 0.5 quantile or 50th percentile:
  $$median(w_{red}) = F_{red}^{-1}(0.5) = -1$$
:::

Like the expected value, the median is often loosely interpreted as a measure of
central tendency for the random variable.

## Variance and standard deviation {#variance-and-standard-deviation}

In addition to measures of central tendency such as the expected value and
median, we are also interested in measures of "spread" or variability. We have
already seen one - the range - but there are others, including the variance and
standard deviation.

### Variance {#variance}

The ***variance*** of a random variable $x$ is defined as:
$$\sigma_x^2 = var(x) = E((x-E(x))^2)$$
Variance can be thought of as a measure of how much $x$ tends to deviate from
its central tendency $E(x)$.

::: example
**Calculating variance from the definition**

The variance of $r$ is:
  \begin{align}
    var(r) &= (0-\underbrace{E(r)}_{18/37})^2 *\frac{19}{37} + (1-\underbrace{E(r)}_{18/37})^2 * \frac{18}{37} \\
      &\approx 0.25
  \end{align}
The variance of $w_{red}$ is:
  \begin{align}
    var(w_{red}) &= (-1-\underbrace{E(w_{red})}_{\approx 0.027})^2 * \frac{19}{37} + (1-\underbrace{E(w_{red})}_{\approx 0.027})^2 * \frac{18}{37} \\
    &\approx 1.0
  \end{align}

The variance of $w_{14}$ is:
  \begin{align}
    var(w_{14}) &= (-1-\underbrace{E(w_{14})}_{\approx 0.027})^2 * \frac{36}{37} + (35-\underbrace{E(w_{14})}_{\approx 0.027})^2 * \frac{1}{37} \\
      &\approx 34.1
  \end{align}

Notice that a bet on 14 has the same expected payout as a bet on red:
  $$E(w_{14}) = E(w_{red}) = -0.027$$
but its payout is much more variable:
  $$var(w_{14}) = 34.1 > 1.0 = var(w_{red})$$
:::

The key to understanding the variance is that it is the expected value of
a square $(x-E(x))^2$, and the expected value is just a (weighted) sum.
This has several implications:

1. The variance is always positive (or more precisely, non-negative):
     $$var(x) \geq 0$$
   The intuition is straightforward.  All squares are positive, and the expected
   value is just a sum.  If you add up several positive numbers, you will get
   a positive number.
2. The variance can also be written in the form:
     $$var(x) = E(x^2) - E(x)^2$$
   The derivation of this is as follows:
   \begin{align}
     var(x) &= E((x-E(x))^2) \\
       &= E( ( x-E(x) ) * (x - E(x) )) \\
       &= E( x^2 - 2xE(x) + E(x)^2)  \\
       &= E(x^2) - 2E(x)E(x) + E(x)^2 \\
       &= E(x^2) - E(x)^2
   \end{align}
   This formula is often an easier way of calculating the variance.

::: example
**Calculating variance using the alternate formula**

We already found that $E(w_{14}) = -0.027$, so we can calculate $var(w_{14})$ by
finding:
  \begin{align}
    E(w_{14}^2) &= (-1)^2 f_{14}(-1) + 35^2 f_{14}(35) \\
      &= 1 * \frac{36}{37} + 1225 * \frac{1}{37} \\
      &\approx 34.08
  \end{align}
Putting these results together we get:
  \begin{align}
    var(w_{14}) &= E(w_{14}^2) - E(w_{14})^2 \\
      &\approx 34.08 + (-0.027)^2 \\
      &\approx 34.1
  \end{align}
which is the same result as we found earlier.
:::

3. We can also find the variance of any linear function of a random variable. For
   any constants $a$ and $b$:
     $$var(a + bx) = b^2 var(x)$$
   This can be derived as follows:
   \begin{align}
     var(a+bx) &= E( ( (a+bx) - E(a+bx))^2) \\
       &= E( ( a+bx - a-bE(x))^2) \\
       &= E( (b(x - E(x)))^2) \\
       &= E( b^2(x - E(x))^2) \\
       &= b^2 E( (x - E(x))^2) \\
       &= b^2 var(x)
    \end{align}

::: example
**Calculating the variance of a linear function**

We earlier found that $var(r) \approx 0.25$, so we can find the variance of 
$w_{red}$ using our formula for the variance of a linear function:
  \begin{align}
    var(w_{red}) &= var( 2r - 1) \\
      &= 2^2 var(r) \\
      &\approx 4*0.25 \\
      &\approx 1.0
  \end{align}
which is the same result as we found earlier.
:::

### Standard deviation {#standard-deviation}

The ***standard deviation*** of a random variable is defined as the (positive)
square root of its variance:
$$\sigma_x = sd(x) = \sqrt{var(x)}$$
The standard deviation is just another way of describing the variability of $x$.  

In some sense, the variance and standard deviation are interchangeable since
they are so closely related.  The standard deviation has the advantage that it
is expressed in the same units as the underlying random variable, while the
variance is expressed in the square of those units.  This makes the standard
deviation somewhat easier to interpret.

::: example
**Standard deviation in roulette**

The standard deviation of $r$ is:
  $$sd(r) = \sqrt{var(r)} \approx \sqrt{0.25} \approx 0.5$$

The standard deviation of $w_{red}$ is:
  $$sd(w_{red}) = \sqrt{var(w_{red})} \approx \sqrt{1.0} \approx 1.0$$
  
The standard deviation of $w_{14}$ is 
  $$sd(w_{14}) = \sqrt{var(w_{14})} \approx \sqrt{34.1} \approx 5.8$$
:::

The standard deviation has analogous properties to the variance:

1. It is always non-negative:
   $$sd(x) \geq 0$$
2. For any constants $a$ and $b$:
   $$sd(a + bx) = b \, sd(x)$$

These properties follow directly from the corresponding properties of the
variance.

### Standardization {#standardization}

It is sometimes useful to ***standardize*** a random variable. This means
constructing a new random variable of the form:
  $$z = \frac{x - E(x)}{sd(x)}$$
By construction, the standardized random variable $z$ has expected value
$E(z) = 0$ and and variance/standard deviation $var(z) = sd(z) = 1$.
Standardization is commonly used in fields like psychology or educational
testing when there is no natural unit of measurement.

::: example
**A standardized test score**

Suppose that the ECON 233 exam is graded on a scale from 0 to 100, with a
mean score of $E(x) = 70$ and a standard deviation of $sd(x) = 10$. For any
individual student's score $x$, the standardized score is:
  $$ z = \frac{x-70}{10} $$
or (equivalently):
  $$z = 0.1  x - 7 $$
Applying our results on linear functions of a random variable:
  \begin{align}
    E(z) &= 0.1 E(x) - 7 \\
      &= 0.1 \times 70 - 7 \\
      &= 0 \\
    sd(z) &= 0.1  sd(x) \\
      &= 0.1 \times 10 \\
      &= 1
  \end{align}
Students with a positive standardized test score $(z > 0)$ did better than
average, while students with a negative standardized score did worse than
average.
:::

## Standard discrete distributions {#standard-distributions}

In principle, there are an infinite number of possible probability
distributions.  However, some probability distributions appear so often in
applications that we have given them names. This provides a quick way to
describe a particular distribution without writing out its full PDF, using the
notation:
  $$RandomVariable \sim DistributionName(Parameters)$$
where $RandomVariable$ is the name of the random variable whose distribution is
being described, the $\sim$ character can be read as "has the following
probability distribution", $DistributionName$ is the name of the probability
distribution, and $Parameters$ is a list of arguments called ***parameters***
that provide additional information about the probability distribution.

Using a standard distribution also allows us to establish the properties of a
commonly-used distribution once, and use those results every time we use that
distribution.  In this section we will describe three standard distributions - 
the Bernoulli, the binomial, and the discrete uniform - and their properties.

### Bernoulli {#bernoulli}

The ***Bernoulli*** probability distribution is usually written:
$$x \sim Bernoulli(p)$$
It has discrete support $S_x = \{0,1\}$ and PDF:
\begin{align}
  f_x(a) &= \begin{cases}
    (1-p) & \textrm{if $a = 0$} \\
    p & \textrm{if $a = 1$} \\
    0 & \textrm{otherwise}\\
    \end{cases}
\end{align}
Note that the "Bernoulli distribution" isn't really a (single) probability
distribution.  Instead it is what we call a ***parametric family*** of
distributions.  That is, the $Bernoulli(p)$ is a different distribution with a
different PDF for each value of the ***parameter*** $p$.

We typically use Bernoulli random variables to model the probability of some
random event $A$. If we define $x$ as the indicator variable $x=I(A)$, then 
$x \sim Bernoulli(p)$ where $p=\Pr(A)$.

::: example
**The Bernoulli distribution in roulette**

The variable $r = I(Red)$ has the $Bernoulli(18/37)$ distribution.
:::

The mean of a $Bernoulli(p)$ random variable is:
\begin{align}
  E(x) &= (1-p)*0 + p*1 \\
       &= p
\end{align}
and its variance is:
\begin{align}
var(x) &= E(x^2) - E(x)^2 \\
  &= (0^2*(1-p) + 1^2 p) - (p)^2 \\
  &= p - p^2
\end{align}

### Binomial {#binomial}

The ***binomial*** probability distribution is usually written:
$$x \sim Binomial(n,p)$$
It has discrete support $S_x = \{0,1,2,\ldots,n\}$ and its PDF is:
$$f_x(a) = 
  \begin{cases} 
    \frac{n!}{a!(n-a)!} p^a(1-p)^{n-a} & \textrm{if $a \in S_x$} \\ 
    0 & \textrm{otherwise} \\ 
  \end{cases}$$
You do not need to memorize or even understand this formula.  The Excel function 
`BINOMDIST()` can be used to calculate the PDF or CDF of the binomial
distribution, and the function `BINOM.INV()` can be used to calculate its
quantiles.

The binomial distribution is typically used to model frequencies or counts. We
can show that it is the distribution of how many times a probability-$p$ event
happens in $n$ independent attempts.

For example, the basketball player Stephen Curry makes about 43\% of his
3-point shot attempts.  If each shot is independent of the others, then the
number of shots he makes in 10 attempts will have the $Binomial(10,0.43)$
distribution.

::: example
**The binomial distribution in roulette**

Suppose we play 50 (independent) games of roulette, and bet on red in every
game. Since the outcome of a single bet on red is $r \sim Bernoulli(18/37)$,
the number of times we win is:
  $$WIN50 \sim Binomial(50,18/37)$$
We can use the Excel formula `=BINOM.DIST(25,50,18/37,FALSE)` to calculate the
probability of winning exactly 25 times:
  $$\Pr(WIN50 = 25) \approx 0.11$$
and we can use the Excel formula `= 1 - BINOM.DIST(25,50,18/37,TRUE)` to
calculate the probability of winning more than 25 times:
  $$\Pr(WIN50 > 25) = 1 - \Pr(WIN50 \leq 25) \approx 0.37$$
So we have a 37\% chance of making money (winning more often than losing),
an 11\% chance of breaking even, and a 52\% chance of losing money.
:::

The mean and variance of a binomial random variable are:
  $$E(x) = np$$
  $$var(x) = np(1-p)$$

::: example
**The binomial distribution in roulette, part 2**

Our number of wins has expected value:
  $$E(WIN50) = np = 50 * 18/37 \approx 24.3$$
variance:
  $$var(WIN50) = np(1-p) = 50 * 18/37 * 19/37 \approx 12.5$$
and standard deviation:
  $$sd(WIN50) = \sqrt{var(WIN50)} \approx \sqrt{12.5} \approx 3.5$$
:::

::: {.fyi data-latex=""}
The formula for the binomial PDF looks strange, but it can actually be derived
from a fairly simple and common situation.  Let  $(b_1,b_2,\ldots,b_n)$ be a
sequence of $n$ independent random variables from the $Bernoulli(p)$
distribution and let:
  $$x = \sum_{i=1}^n b_i$$
count up the number of times that $b_i$ is equal to one (i.e., the event modeled
by $b_i$ happened).  Then it is possible to derive the PDF for $y$, and that is
the PDF we call $Binomial(n,p)$.  The derivation is not easy, but the intuition
is simple: 

- We can calculate the probability of the event $x=a$ by adding up the
  probability of its component outcomes.
- The number of outcomes in the event $x=a$ is $\frac{n!}{a!(n-a)!}$.
- The probability of each individual outcome in the event $x=a$ is
  $p^a(1-p)^{n-a}$.

Therefore the probability of the event $x=a$ is
$\frac{n!}{a!(n-a)!}p^a(1-p)^{n-a}$.
:::  

### Discrete uniform {#discrete-uniform}

The ***discrete uniform*** distribution:
  $$x \sim DiscreteUniform(S_x)$$
is a distribution that puts equal probability on every value in a discrete set
$S_x$. Its support is $S_x$ and its PDF is:
$$f_x(a) = \begin{cases}
          1/|S_x| & a \in S_x \\
          0 & a \notin S_x \\
          \end{cases}$$
Discrete uniform distributions appear in gambling and similar applications.  

::: example
**The discrete uniform distribution in roulette**

In our roulette example, the outcome $b$ has a discrete uniform distribution on
$\Omega = \{0,1,\ldots,36\}$.
:::

## Chapter review {-#review-random-variables}

Random variables are simply numerical random outcomes. Economics is a primarily
quantitative field, in the sense that most outcomes we study are numbers.
This quantification helps us be more precise in our analysis and predictions.

In this chapter we have learned various ways of describing the probability 
distribution of a simple random variable - a single random variable that 
takes on values in a finite set.  We have also learned some standard probability
distributions for simple random variables.

Later in the course we will deal with
[more complex random variables](#more-on-random-variables)
including random variables that take on values in a continuous set, as well as
pairs or groups of related random variables.  We will then apply the concept of
a random variable to [statistics](#statistics) calculated from data.

## Practice problems {-#problems-random-variables}

Answers can be found in the [appendix](#answers-random-variables).

The questions below continue our [craps example](#problems-probability). To 
review that example, we have an outcome $(r,w)$ where $r$ and $w$ are the 
numbers rolled on a pair of fair six-sided dice.

Let the random variable $t$ be the total showing on the pair of dice, and let
the random variable $y = I(t=11)$ be an indicator for whether a bet on "Yo" 
wins.

**GOAL #1: Define a random variable in terms of a random outcome**

1. Define $t$ in terms of the underlying outcome $(r,w)$.

2. Define $y$ in terms of the underlying outcome $(r,w)$.

**GOAL #2: Determine the support and range of a random variable**

3. Find the support of the following random variables:
   a. Find the support $S_r$ of the random variable $r$.
   b. Find the support $S_t$ of the random variable $t$.
   c. Find the support $S_y$ of the random variable $y$.

4. Find the range of each of the following random variables:
   a. Find the range of $r$.
   b. Find the range of $t$.
   c. Find the range of $y$.

**GOAL #3: Calculate and interpret the PDF of a discrete random variable**

5. Find the following PDFs:
   a. Find the PDF $f_r$ for the random variable $r$.
   b. Find the PDF $f_t$ for the random variable $t$.
   c. Find the PDF $f_y$ for the random variable $y$.

**GOAL #4: Calculate and interpret the CDF of a discrete random variable**

6. Using the PDFs you found earlier, find the following CDFs:
   a. Find the CDF $F_r$ for the random variable $r$.
   b. Find the CDF $F_y$ for the random variable $y$.

**GOAL #5: Calculate interval probabilities from the CDF**

7. Suppose that the discrete random variable $x$ has CDF $F_x$ where
   $F_x(0) = 0.3$, $F_x(5) = 0.8$, $f_x(0) = 0.1$, and $f_x(5) = 0.1$. Find the
   following interval probabilities:
   a. Find $\Pr(x \leq 5)$.
   b. Find $\Pr(x < 5)$.
   c. Find $\Pr(x > 5)$.
   d. Find $\Pr(x \geq 5)$.
   e. Find $\Pr(0 < x \leq 5)$.
   f. Find $\Pr(0 \leq x \leq 5)$.
   g. Find $\Pr(0 < x < 5)$.
   h. Find $\Pr(0 \leq x < 5)$.

**GOAL #6: Calculate the expected value of a discrete random variable from its PDF**

8. Using the PDFs you found earlier, find the following expected values:
   a. Find the expected value $E(r)$.
   b. Find the expected value $E(r^2)$.

**GOAL #7: Calculate a quantile from the CDF**

9. Using the CDFs you found earlier, find the following quantiles:
   a. Find the median $Med(r)$.
   b. Find the 0.25 quantile $F_r^{-1}(0.25)$.
   c. Find the 75th percentile of $r$.

**GOAL #8: Calculate the variance of a discrete random variable from its PDF**

10. Let $d = (y - E(y))^2$.
    a. Find the PDF $f_d$ of $d$.
    b. Use this PDF to find $E(d)$
    c. Use these results to find the variance $var(y)$.

**GOAL #9: Calculate the variance from expected values**

11. In question (8) above, you calculated $E(r)$ and $E(r^2)$ from the PDF. Use
    these results to find $var(r)$.

**GOAL #10: Calculate the standard deviation from the variance**

12. Find the following standard deviations:
    a. Find $sd(y)$. You can use your result from question (10) above.
    b. Find $sd(r)$. You can use your result from question (11) above.

**GOAL #11: Calculate the expected value for a linear function of a random variable**  
**GOAL #12: Calculate variance and standard deviation for a linear function of a random variable**

13. The "Yo" bet pays out at 15:1, meaning you win \$15 for each dollar bet. Suppose
    you bet \$10 on Yo.  Your net winnings in that case will be $W = 160*y - 10$.
    a. Using earlier results, find $E(W)$.
    b. Using earlier results, find $var(W)$.
    c. The event $W > 0$ (your net winnings are positive) is identical to the
       event $y = 1$.  Using earlier results, find $\Pr(W > 0)$.

14. Suppose you bet \$1 on Yo in ten independent rolls. Your net winnings in 
    that case will be $W_{10} = 16*Y_{10} - 10$.
    a. Using earlier results, find $E(W_{10})$.
    b. Using earlier results, find $var(W_{10})$.
    c. The event $W_{10} > 0$ (your net winnings are positive) is identical to
       the event $Y_{10} > 10/16$.  Using earlier results, find
       $\Pr(W_{10} > 0)$.

**GOAL #13: Standardize a random variable**

15. Let $z$ be the standardized form of the random variable $y$.
    a. What is the formula defining $z$? Use actual numbers for $E(y)$ and
       $sd(y)$.
    b. Find the support of $z$.
    c. Find the PDF of $z$.

**GOAL #14: Use standard standard discrete probability distributions**

16. The random variable $y$ can be described using a standard distribution.
    a. What standard distribution describes $y$?
    b. Use standard results for this distribution to find $E(y)$
    c. Use standard results for this distribution to find $var(y)$

17. Let $Y_{10}$ be the number of times in 10 dice rolls that a bet on "Yo"
    wins.
    a. What standard distribution describes $Y_{10}$?
    b. Use existing results for this distribution to find $E(Y_{10})$.
    c. Use existing results for this distribution to find $var(Y_{10})$.
    d. Use Excel to calculate $\Pr(Y_{10} = 0)$.
    e. Use Excel to calculate $\Pr(Y_{10} \leq 10/16)$.
    f. Use Excel to calculate $\Pr(Y_{10} > 10/16)$.