gperdrizet
diff --git a/‎notebooks/unit4/lesson_26/Lesson_26_activity.ipynb‎
Lines changed: 24 additions & 15 deletions b/‎notebooks/unit4/lesson_26/Lesson_26_activity.ipynb‎
Lines changed: 24 additions & 15 deletions
diff --git a/‎notebooks/unit4/lesson_26/Lesson_26_demo.ipynb‎
Lines changed: 267 additions & 33 deletions b/‎notebooks/unit4/lesson_26/Lesson_26_demo.ipynb‎
Lines changed: 267 additions & 33 deletions
@@ -94,7 +94,7 @@
    "source": [
     "### 1.2. Train test split\n",
     "\n",
-    "Use `train_test_split` to split the data into training and testing sets. Use `random_state=42` for reproducibility."
+    "Use `train_test_split` to split the data into training and testing sets. Use `random_state=315` for reproducibility."
    ]
   },
   {
@@ -163,9 +163,7 @@
    "source": [
     "### 2.2. Test set evaluation\n",
     "\n",
-    "For classification, we use accuracy instead of R². Use the model's `predict` method to get predictions and `score` method to get accuracy.\n",
-    "\n",
-    "**Hint:** `model.score(X, y)` returns the accuracy for classifiers."
+    "For classification, we can use accuracy, F1 score and/or AUC-ROC (and others) instead of R². Use sklearn's [`metrics`](https://scikit-learn.org/stable/api/sklearn.metrics.html) module ."
    ]
   },
   {
@@ -177,7 +175,11 @@
    "source": [
     "logistic_predictions = # YOUR CODE HERE\n",
     "logistic_accuracy = # YOUR CODE HERE\n",
-    "print(f'Logistic Regression accuracy on test set: {logistic_accuracy:.4f}')"
+    "logistic_f1 = # YOUR CODE HERE\n",
+    "logistic_auc = # YOUR CODE HERE\n",
+    "print(f'Logistic regression accuracy on test set: {logistic_accuracy:.4f}')\n",
+    "print(f'Logistic regression F1 score on test set: {logistic_f1:.4f}')\n",
+    "print(f'Logistic regression AUC-ROC score on test set: {logistic_auc:.4f}')"
    ]
   },
   {
@@ -187,7 +189,7 @@
    "source": [
     "### 2.3. Performance analysis\n",
     "\n",
-    "For classification, we visualize performance using a confusion matrix."
+    "For classification, visualize performance using a confusion matrix."
    ]
   },
   {
@@ -207,14 +209,14 @@
    "source": [
     "## 3. Multilayer perceptron (MLP) classifier\n",
     "\n",
-    "Now let's build a neural network classifier using `MLPClassifier`.\n",
+    "Now let's build a neural network classifier using sklearn's [`MLPClassifier`](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html).\n",
     "\n",
     "### 3.1. Single epoch training function\n",
     "\n",
     "Complete the training function below. It should:\n",
     "1. Split the data into training and validation sets\n",
     "2. Call `partial_fit` on the model (remember to pass `classes=[0, 1]` on the first call)\n",
-    "3. Record training and validation accuracy in the history dictionary\n",
+    "3. Record training and validation [`log_loss`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html) (aka binary cross-entropy) in the history dictionary\n",
     "\n",
     "**Hint:** Use `model.partial_fit(X, y, classes=[0, 1])` for the first epoch. For subsequent epochs, `partial_fit` remembers the classes."
    ]
@@ -228,14 +230,17 @@
    "source": [
     "def train(model: MLPClassifier, df: pd.DataFrame, training_history: dict, classes: list = None) -> tuple[MLPClassifier, dict]:\n",
     "    '''Trains sklearn MLP classifier model on given dataframe using validation split.\n",
-    "    Returns the updated model and training history dictionary.'''\n",
+    "    Returns the updated model and training history dictionary containing training and\n",
+    "    validation log loss. If classes are not provided, assumes 0 and 1.'''\n",
+    "\n",
+    "    global features, label\n",
     "\n",
     "    df, val_df = train_test_split(df, random_state=315)\n",
     "    \n",
     "    # YOUR CODE HERE: call partial_fit on the model\n",
     "    # If classes is provided, pass it to partial_fit\n",
     "    \n",
-    "    # YOUR CODE HERE: append training and validation accuracy to history\n",
+    "    # YOUR CODE HERE: append training and validation log loss to history\n",
     "    \n",
     "    return model, training_history"
    ]
@@ -267,8 +272,8 @@
     "epochs = 10\n",
     "\n",
     "training_history = {\n",
-    "    'training_accuracy': [],\n",
-    "    'validation_accuracy': []\n",
+    "    'training_loss': [],\n",
+    "    'validation_loss': []\n",
     "}\n",
     "\n",
     "mlp_model = # YOUR CODE HERE: create MLPClassifier\n",
@@ -285,7 +290,7 @@
    "source": [
     "### 3.3. Learning curves\n",
     "\n",
-    "Plot the training and validation accuracy over epochs to visualize the learning process."
+    "Plot the training and validation loss over epochs to visualize the learning process."
    ]
   },
   {
@@ -295,7 +300,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# YOUR CODE HERE: plot training and validation accuracy\n",
+    "# YOUR CODE HERE: plot training and validation loss\n",
     "# Use plt.plot() for each curve\n",
     "# Add title, xlabel, ylabel, and legend"
    ]
@@ -319,7 +324,11 @@
    "source": [
     "mlp_predictions = # YOUR CODE HERE\n",
     "mlp_accuracy = # YOUR CODE HERE\n",
-    "print(f'MLP accuracy on test set: {mlp_accuracy:.4f}')"
+    "mlp_f1 = # YOUR CODE HERE\n",
+    "mlp_auc = # YOUR CODE HERE\n",
+    "print(f'MLP accuracy on test set: {mlp_accuracy:.4f}')\n",
+    "print(f'MLP F1 score on test set: {mlp_accuracy:.4f}')\n",
+    "print(f'MLP AUC-ROC score on test set: {mlp_accuracy:.4f}')"
    ]
   },
   {