Skip to content

Commit 097af5a

Browse files
committed
updated Beta regression chapter
1 parent 3cfe04e commit 097af5a

20 files changed

Lines changed: 674 additions & 397 deletions

book/02-stats-review.qmd

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4622,7 +4622,7 @@ ggplot(
46224622
),
46234623
panel.grid.minor = element_blank(),
46244624
legend.spacing.x = unit(2.2, "cm"),
4625-
legend.box.spacing = unit(2.5, "cm")
4625+
legend.box.spacing = unit(1.2, "cm")
46264626
) +
46274627
labs(
46284628
x = "\n Number of chocolate preferences out of 10 children",
@@ -7566,7 +7566,7 @@ These two approaches lead to the same reject-or-fail-to-reject decision when the
75667566
| **Left-tailed** | $\theta < \theta_0$ | Reject $H_0$ if $p\text{-value}\leq \alpha$. | Reject $H_0$ if $z_{\operatorname{obs}} \leq z_{\alpha}$. |
75677567
| **Two-sided** | $\theta \neq \theta_0$ | Reject $H_0$ if $p\text{-value}\leq \alpha$. | Reject $H_0$ if $|z_{\operatorname{obs}}| \geq z_{1-\alpha/2}$. |
75687568
7569-
: Decision rules for common Normal-approximation tests. Here, $\theta$ denotes a generic parameter, $\theta_0$ denotes the null value, $z_{\operatorname{obs}}$ denotes the observed value of the standardized test statistic, and $z_q$ denotes the $q$th quantile of the standard Normal distribution. {#tbl-pvalue-critical-rules .striped .hover}
7569+
: Decision rules for common Normal-approximation tests. Here, $\theta$ denotes a generic parameter, $\theta_0$ denotes the null value, $z_{\operatorname{obs}}$ denotes the observed value of the standardized test statistic, and $z_q$ denotes the $q$-quantile of the standard Normal distribution. {#tbl-pvalue-critical-rules .striped .hover}
75707570
75717571
For the ice cream case, we will use the first two rows of @tbl-pvalue-critical-rules. The **demand query** is right-tailed because the alternative is $H_1 \text{: }\pi>0.50$. On the other hand, the **time query** is left-tailed because the alternative is $H_1 \text{: }\mu<12$. The two-sided row will become useful when we discuss confidence intervals and, later in the cookbook, two-sided tests for regression coefficients.
75727572
@@ -7691,7 +7691,7 @@ The above code output provides the numerical details of the test. The plot in @f
76917691
76927692
```{r}
76937693
#| label: fig-demand-null-distribution
7694-
#| fig-cap: "Null distribution for the demand-query test. The shaded right tail represents the p-value for the one-sided alternative that the chocolate-preference probability is larger than one half. Inferential results correspond to the R-based observed."
7694+
#| fig-cap: "Null distribution for the demand-query test. The shaded right tail represents the p-value for the one-sided alternative that the chocolate-preference probability is larger than one half. Inferential results correspond to the R-based observed sample."
76957695
#| echo: false
76967696
#| message: false
76977697
#| warning: false
@@ -8022,7 +8022,7 @@ z_{0.975}
80228022
},
80238023
$$
80248024
8025-
$z_q$ denotes the $q$th quantile of the standard Normal distribution. This interval uses the **estimated standard error** because the goal is uncertainty quantification around the estimate, **not testing a specific null value**. That is why the standard error here uses $\hat{\pi}_{\operatorname{MLE,obs}}$ rather than $\pi_0=0.50$.
8025+
$z_q$ denotes the $q$-quantile of the standard Normal distribution. This interval uses the **estimated standard error** because the goal is uncertainty quantification around the estimate, **not testing a specific null value**. That is why the standard error here uses $\hat{\pi}_{\operatorname{MLE,obs}}$ rather than $\pi_0=0.50$.
80268026
80278027
For the **time query**, the CLT-based 95% confidence interval for $\mu$ is
80288028
@@ -8033,7 +8033,7 @@ z_{0.975}
80338033
\frac{s_T}{\sqrt{n_t}},
80348034
$$
80358035
8036-
where $z_q$ denotes the $q$th quantile of the standard Normal distribution.
8036+
where $z_q$ denotes the $q$-quantile of the standard Normal distribution.
80378037
80388038
Note that both intervals are two-sided because they measure uncertainty above and below the observed estimates. The following code computes both intervals from the observed samples.
80398039

docs/book/01-intro.html

Lines changed: 10 additions & 10 deletions
Large diffs are not rendered by default.

docs/book/02-stats-review.html

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7690,7 +7690,7 @@ <h1 class="title"><span id="sec-stats-review" class="quarto-section-identifier">
76907690
<p>These two approaches lead to the same reject-or-fail-to-reject decision when they are based on the same null distribution, alternative hypothesis, and significance level. <a href="#tbl-pvalue-critical-rules" class="quarto-xref">Table&nbsp;<span>2.30</span></a> summarizes the decision rules for common Normal-approximation tests. The important detail is that the rejection region depends on the direction of the alternative hypothesis. A right-tailed test looks for unusually large positive values of <span class="math inline">\(z_{\operatorname{obs}}\)</span>, a left-tailed test looks for unusually small negative values, and a two-sided test looks for values far from zero in either direction.</p>
76917691
<div id="tbl-pvalue-critical-rules" class="striped hover quarto-float quarto-figure quarto-figure-center anchored">
76927692
<figure class="quarto-float quarto-float-tbl figure"><figcaption class="quarto-float-caption-top quarto-float-caption quarto-float-tbl" id="tbl-pvalue-critical-rules-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
7693-
Table&nbsp;2.30: Decision rules for common Normal-approximation tests. Here, <span class="math inline">\(\theta\)</span> denotes a generic parameter, <span class="math inline">\(\theta_0\)</span> denotes the null value, <span class="math inline">\(z_{\operatorname{obs}}\)</span> denotes the observed value of the standardized test statistic, and <span class="math inline">\(z_q\)</span> denotes the <span class="math inline">\(q\)</span>th quantile of the standard Normal distribution.
7693+
Table&nbsp;2.30: Decision rules for common Normal-approximation tests. Here, <span class="math inline">\(\theta\)</span> denotes a generic parameter, <span class="math inline">\(\theta_0\)</span> denotes the null value, <span class="math inline">\(z_{\operatorname{obs}}\)</span> denotes the observed value of the standardized test statistic, and <span class="math inline">\(z_q\)</span> denotes the <span class="math inline">\(q\)</span>-quantile of the standard Normal distribution.
76947694
</figcaption><div aria-describedby="tbl-pvalue-critical-rules-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
76957695
<table class="table-striped table-hover caption-top table">
76967696
<colgroup>
@@ -7886,7 +7886,7 @@ <h1 class="title"><span id="sec-stats-review" class="quarto-section-identifier">
78867886
<img src="02-stats-review_files/figure-html/fig-demand-null-distribution-1.png" class="img-fluid figure-img" width="1344">
78877887
</div>
78887888
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-demand-null-distribution-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
7889-
Figure&nbsp;2.19: Null distribution for the demand-query test. The shaded right tail represents the p-value for the one-sided alternative that the chocolate-preference probability is larger than one half. Inferential results correspond to the R-based observed.
7889+
Figure&nbsp;2.19: Null distribution for the demand-query test. The shaded right tail represents the p-value for the one-sided alternative that the chocolate-preference probability is larger than one half. Inferential results correspond to the R-based observed sample.
78907890
</figcaption></figure>
78917891
</div>
78927892
</div>
@@ -8081,15 +8081,15 @@ <h1 class="title"><span id="sec-stats-review" class="quarto-section-identifier">
80818081
}{n_d}
80828082
},
80838083
\]</span></p>
8084-
<p><span class="math inline">\(z_q\)</span> denotes the <span class="math inline">\(q\)</span>th quantile of the standard Normal distribution. This interval uses the <strong>estimated standard error</strong> because the goal is uncertainty quantification around the estimate, <strong>not testing a specific null value</strong>. That is why the standard error here uses <span class="math inline">\(\hat{\pi}_{\operatorname{MLE,obs}}\)</span> rather than <span class="math inline">\(\pi_0=0.50\)</span>.</p>
8084+
<p><span class="math inline">\(z_q\)</span> denotes the <span class="math inline">\(q\)</span>-quantile of the standard Normal distribution. This interval uses the <strong>estimated standard error</strong> because the goal is uncertainty quantification around the estimate, <strong>not testing a specific null value</strong>. That is why the standard error here uses <span class="math inline">\(\hat{\pi}_{\operatorname{MLE,obs}}\)</span> rather than <span class="math inline">\(\pi_0=0.50\)</span>.</p>
80858085
<p>For the <strong>time query</strong>, the CLT-based 95% confidence interval for <span class="math inline">\(\mu\)</span> is</p>
80868086
<p><span class="math display">\[
80878087
\hat{\mu}_{\operatorname{MLE,obs}}
80888088
\pm
80898089
z_{0.975}
80908090
\frac{s_T}{\sqrt{n_t}},
80918091
\]</span></p>
8092-
<p>where <span class="math inline">\(z_q\)</span> denotes the <span class="math inline">\(q\)</span>th quantile of the standard Normal distribution.</p>
8092+
<p>where <span class="math inline">\(z_q\)</span> denotes the <span class="math inline">\(q\)</span>-quantile of the standard Normal distribution.</p>
80938093
<p>Note that both intervals are two-sided because they measure uncertainty above and below the observed estimates. The following code computes both intervals from the observed samples.</p>
80948094
<div class="tabset-margin-container"></div><div class="panel-tabset">
80958095
<ul class="nav nav-tabs" role="tablist">
-557 Bytes
Loading

docs/book/03-ols.html

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1867,8 +1867,8 @@ <h1 class="title"><span id="sec-ols" class="quarto-section-identifier"><span cla
18671867
Dep. Variable: Net_Money R-squared: 0.894
18681868
Model: OLS Adj. R-squared: 0.892
18691869
Method: Least Squares F-statistic: 661.0
1870-
Date: Fri, 05 Jun 2026 Prob (F-statistic): 0.00
1871-
Time: 19:16:28 Log-Likelihood: -5430.3
1870+
Date: Sun, 07 Jun 2026 Prob (F-statistic): 0.00
1871+
Time: 13:13:47 Log-Likelihood: -5430.3
18721872
No. Observations: 797 AIC: 1.088e+04
18731873
Df Residuals: 786 BIC: 1.093e+04
18741874
Df Model: 10
@@ -3118,8 +3118,8 @@ <h1 class="title"><span id="sec-ols" class="quarto-section-identifier"><span cla
31183118
Dep. Variable: exam_score R-squared: 0.900
31193119
Model: OLS Adj. R-squared: 0.899
31203120
Method: Least Squares F-statistic: 560.3
3121-
Date: Fri, 05 Jun 2026 Prob (F-statistic): 7.35e-93
3122-
Time: 19:16:30 Log-Likelihood: -618.40
3121+
Date: Sun, 07 Jun 2026 Prob (F-statistic): 7.35e-93
3122+
Time: 13:13:50 Log-Likelihood: -618.40
31233123
No. Observations: 190 AIC: 1245.
31243124
Df Residuals: 186 BIC: 1258.
31253125
Df Model: 3
@@ -3474,8 +3474,8 @@ <h1 class="title"><span id="sec-ols" class="quarto-section-identifier"><span cla
34743474
Dep. Variable: exam_score R-squared: 0.904
34753475
Model: OLS Adj. R-squared: 0.900
34763476
Method: Least Squares F-statistic: 212.6
3477-
Date: Fri, 05 Jun 2026 Prob (F-statistic): 9.11e-88
3478-
Time: 19:16:31 Log-Likelihood: -615.06
3477+
Date: Sun, 07 Jun 2026 Prob (F-statistic): 9.11e-88
3478+
Time: 13:13:50 Log-Likelihood: -615.06
34793479
No. Observations: 190 AIC: 1248.
34803480
Df Residuals: 181 BIC: 1277.
34813481
Df Model: 8
@@ -3517,8 +3517,8 @@ <h1 class="title"><span id="sec-ols" class="quarto-section-identifier"><span cla
35173517
Dep. Variable: exam_score R-squared: 0.973
35183518
Model: OLS Adj. R-squared: 0.973
35193519
Method: Least Squares F-statistic: 2262.
3520-
Date: Fri, 05 Jun 2026 Prob (F-statistic): 4.70e-146
3521-
Time: 19:16:31 Log-Likelihood: -493.24
3520+
Date: Sun, 07 Jun 2026 Prob (F-statistic): 4.70e-146
3521+
Time: 13:13:51 Log-Likelihood: -493.24
35223522
No. Observations: 190 AIC: 994.5
35233523
Df Residuals: 186 BIC: 1007.
35243524
Df Model: 3
@@ -3554,8 +3554,8 @@ <h1 class="title"><span id="sec-ols" class="quarto-section-identifier"><span cla
35543554
Dep. Variable: exam_score R-squared: 0.900
35553555
Model: OLS Adj. R-squared: 0.898
35563556
Method: Least Squares F-statistic: 418.1
3557-
Date: Fri, 05 Jun 2026 Prob (F-statistic): 1.85e-91
3558-
Time: 19:16:31 Log-Likelihood: -618.38
3557+
Date: Sun, 07 Jun 2026 Prob (F-statistic): 1.85e-91
3558+
Time: 13:13:51 Log-Likelihood: -618.38
35593559
No. Observations: 190 AIC: 1247.
35603560
Df Residuals: 185 BIC: 1263.
35613561
Df Model: 4

docs/book/04-gamma.html

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1789,25 +1789,25 @@ <h1 class="title"><span id="sec-gamma" class="quarto-section-identifier"><span c
17891789
<span id="cb27-16"><a href="#cb27-16" aria-hidden="true" tabindex="-1"></a> plt.show()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
17901790
<div class="cell-output cell-output-stdout">
17911791
<pre><code>&lt;Axes: xlabel='cdn_usage', ylabel='response_time_ms'&gt;
1792-
&lt;matplotlib.legend.Legend object at 0x15775e190&gt;
1792+
&lt;matplotlib.legend.Legend object at 0x156010b90&gt;
17931793
Text(0.5, 1.0, 'Response Time by Cdn Usage')
17941794
Text(0.5, 0, 'Cdn Usage')
17951795
Text(0, 0.5, 'Response Time (ms)')
17961796
([0, 1], [Text(0, 0, 'Yes'), Text(1, 0, 'No')])
17971797
&lt;Axes: xlabel='request_complexity', ylabel='response_time_ms'&gt;
1798-
&lt;matplotlib.legend.Legend object at 0x162009890&gt;
1798+
&lt;matplotlib.legend.Legend object at 0x15605dc50&gt;
17991799
Text(0.5, 1.0, 'Response Time by Request Complexity')
18001800
Text(0.5, 0, 'Request Complexity')
18011801
Text(0, 0.5, 'Response Time (ms)')
18021802
([0, 1, 2], [Text(0, 0, 'Moderate'), Text(1, 0, 'Complex'), Text(2, 0, 'Simple')])
18031803
&lt;Axes: xlabel='day_of_week', ylabel='response_time_ms'&gt;
1804-
&lt;matplotlib.legend.Legend object at 0x162074610&gt;
1804+
&lt;matplotlib.legend.Legend object at 0x1561d4f50&gt;
18051805
Text(0.5, 1.0, 'Response Time by Day Of Week')
18061806
Text(0.5, 0, 'Day Of Week')
18071807
Text(0, 0.5, 'Response Time (ms)')
18081808
([0, 1, 2, 3, 4, 5, 6], [Text(0, 0, 'Saturday'), Text(1, 0, 'Thursday'), Text(2, 0, 'Monday'), Text(3, 0, 'Friday'), Text(4, 0, 'Tuesday'), Text(5, 0, 'Wednesday'), Text(6, 0, 'Sunday')])
18091809
&lt;Axes: xlabel='geographic_region', ylabel='response_time_ms'&gt;
1810-
&lt;matplotlib.legend.Legend object at 0x1620c0f50&gt;
1810+
&lt;matplotlib.legend.Legend object at 0x1562c8cd0&gt;
18111811
Text(0.5, 1.0, 'Response Time by Geographic Region')
18121812
Text(0.5, 0, 'Geographic Region')
18131813
Text(0, 0.5, 'Response Time (ms)')
@@ -2189,8 +2189,8 @@ <h1 class="title"><span id="sec-gamma" class="quarto-section-identifier"><span c
21892189
Model Family: Gamma Df Model: 19
21902190
Link Function: log Scale: 0.60074
21912191
Method: IRLS Log-Likelihood: -4996.9
2192-
Date: Fri, 05 Jun 2026 Deviance: 565.82
2193-
Time: 19:16:54 Pearson chi2: 589.
2192+
Date: Sun, 07 Jun 2026 Deviance: 565.82
2193+
Time: 13:14:10 Pearson chi2: 589.
21942194
No. Iterations: 17 Pseudo R-squ. (CS): 0.7695
21952195
Covariance Type: nonrobust
21962196
===========================================================================================================
107 Bytes
Loading
-38 Bytes
Loading
27 Bytes
Loading
-71 Bytes
Loading

0 commit comments

Comments
 (0)