UW-Madison-DataScience
diff --git a/‎notebooks/01-introduction.ipynb‎
Lines changed: 3 additions & 3 deletions b/‎notebooks/01-introduction.ipynb‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎notebooks/02-regression.ipynb‎
Lines changed: 40 additions & 40 deletions b/‎notebooks/02-regression.ipynb‎
Lines changed: 40 additions & 40 deletions
@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "b29bfe77",
+   "id": "d79f6321",
    "metadata": {},
    "source": [
     "---\n",
@@ -111,7 +111,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "b41e957e",
+   "id": "bc069c6f",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -121,7 +121,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f6bd1eca",
+   "id": "2a2e5875",
    "metadata": {},
    "source": [
     "### Representation of Data in Scikit-learn\n",
 
@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "e53f64ca",
+   "id": "a29779d5",
    "metadata": {},
    "source": [
     "---\n",
@@ -60,7 +60,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "ac76ffb1",
+   "id": "b4b5673e",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -74,7 +74,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "2d62170e",
+   "id": "b5b55fea",
    "metadata": {},
    "source": [
     "We can see that we have seven columns in total: 4 continuous (numerical) columns named `bill_length_mm`, `bill_depth_mm`, `flipper_length_mm`, and `body_mass_g`; and 3 discrete (categorical) columns named `species`, `island`, and `sex`. We can also see from a quick inspection of the first 5 samples that we have some missing data in the form of `NaN` values. Missing data is a fairly common occurrence in real-life data, so let's go ahead and remove any rows that contain `NaN` values:"
@@ -83,7 +83,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "e4fad2f2",
+   "id": "869fd9b9",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -93,7 +93,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "516f6450",
+   "id": "497591f9",
    "metadata": {},
    "source": [
     "In this scenario we will train a linear regression model using `body_mass_g` as our feature data and `bill_depth_mm` as our label data. We will train our model on a subset of the data by slicing the first 146 samples of our cleaned data. \n",
@@ -104,7 +104,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "b8e72ebc",
+   "id": "b2651838",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -123,7 +123,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "2f319a23",
+   "id": "ab6258f7",
    "metadata": {},
    "source": [
     "In this regression example we will create a Linear Regression model that will try to predict `y` values based upon `x` values.\n",
@@ -148,7 +148,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "2c68391f",
+   "id": "2e440925",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -161,7 +161,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "e6d87d33",
+   "id": "418b3305",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -176,7 +176,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0e467680",
+   "id": "eb3c0e63",
    "metadata": {},
    "source": [
     "Next we’ll define a model, and train it on the pre-processed data. We’ll also inspect the trained model parameters m and c:"
@@ -185,7 +185,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "0d82bbd9",
+   "id": "291e8505",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -205,7 +205,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "56bf0227",
+   "id": "bbb96a1e",
    "metadata": {},
    "source": [
     "Now we can make predictions using our trained model, and calculate the Root Mean Squared Error (RMSE) of our predictions:"
@@ -214,7 +214,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "c3b9346a",
+   "id": "10412803",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -232,7 +232,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e931b1b1",
+   "id": "80063cb0",
    "metadata": {},
    "source": [
     "Finally, we’ll plot our input data, our linear fit, and our predictions:"
@@ -241,7 +241,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "2c2449db",
+   "id": "19c3008b",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -256,7 +256,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "17cdb92e",
+   "id": "c30792e3",
    "metadata": {},
    "source": [
     "Congratulations! We've now created our first machine-learning model of the lesson and we can now make predictions of `bill_depth_mm` for any `body_mass_g` values that we pass into our model.\n",
@@ -267,7 +267,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "70ea12e1",
+   "id": "83916985",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -291,7 +291,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3182040f",
+   "id": "d8ff91d4",
    "metadata": {},
    "source": [
     "Our RMSE for predictions on all penguin samples is far larger than before, so let's visually inspect the situation:"
@@ -300,7 +300,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "eb1dc01d",
+   "id": "e52a9e7b",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -316,7 +316,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "acc4b15c",
+   "id": "cdf58529",
    "metadata": {},
    "source": [
     "Oh dear. It looks like our linear regression fits okay for our subset of the penguin data, and a few additional samples, but there appears to be a cluster of points that are poorly predicted by our model. Even if we re-trained our model using all samples it looks unlikely that our model would perform much better due to the two-cluster nature of our dataset.\n",
@@ -344,7 +344,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "303f77b2",
+   "id": "edd5be58",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -361,7 +361,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6ff788fe",
+   "id": "b3822c3b",
    "metadata": {},
    "source": [
     "### Exercise: Try to re-implement our univariate regression model using these new train/test sets.\n",
@@ -377,7 +377,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "05b2da13",
+   "id": "94b5a534",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -412,7 +412,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8b324b9d",
+   "id": "c3245252",
    "metadata": {},
    "source": [
     "**Quick follow-up**: Interpret the results of your model. Is it accurate? What does it say about the relationship between body mass and bill depth? Is this a \"good\" model?\n",
@@ -436,7 +436,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "af68b978",
+   "id": "8ee2f060",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -450,7 +450,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "efe5a33a",
+   "id": "eb3cdc4e",
    "metadata": {},
    "source": [
     "::::::::::::::::::::::::::::::::::::: callout\n",
@@ -467,7 +467,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "5fea25bb",
+   "id": "d6c84c57",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -478,7 +478,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "2f970876",
+   "id": "e5de71e8",
    "metadata": {},
    "source": [
     "We can now make predictions on train/test sets, and calculate RMSE"
@@ -487,7 +487,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "3cdfe16c",
+   "id": "97ba38c8",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -504,7 +504,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "9c5d6b58",
+   "id": "0fb8087e",
    "metadata": {},
    "source": [
     "Finally, let's visualise our model fit on our training data and full dataset."
@@ -513,7 +513,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "dd01d4da",
+   "id": "08fdd0c4",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -535,7 +535,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "63bfcdb8",
+   "id": "fb14059c",
    "metadata": {},
    "source": [
     "::::::::::::::::::::::::::::::::::::: challenge\n",
@@ -560,7 +560,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "30f94be2",
+   "id": "c9a408e5",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -571,7 +571,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "468d965a",
+   "id": "36844336",
    "metadata": {},
    "source": [
     "Let's try a model that includes penguin species as a predictor."
@@ -580,7 +580,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "10e2f20d",
+   "id": "da7036d4",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -605,7 +605,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "86679459",
+   "id": "af9afb9a",
    "metadata": {},
    "source": [
     "Since the species column is coded as a string, we need to convert it into a numerical format before we can use it in a machine learning model. To do this, we apply dummy coding (also called one-hot encoding), which creates new binary columns for each species category (e.g., species_Adelie, species_Chinstrap, species_Gentoo). Each row gets a 1 in the column that matches its species and 0 in the others.\n",
@@ -616,7 +616,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "969dd4a8",
+   "id": "e0c9f919",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -626,7 +626,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "db3fe19a",
+   "id": "e4c157cc",
    "metadata": {},
    "source": [
     "We can than train/fit and evaluate our model as usual."
@@ -635,7 +635,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "3684bf88",
+   "id": "83c4d36f",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -659,7 +659,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "72701918",
+   "id": "6c2d548e",
    "metadata": {},
    "source": [
     "{% include links.md %}\n",