|
94 | 94 | "source": [ |
95 | 95 | "### 1.2. Train test split\n", |
96 | 96 | "\n", |
97 | | - "Use `train_test_split` to split the data into training and testing sets. Use `random_state=42` for reproducibility." |
| 97 | + "Use `train_test_split` to split the data into training and testing sets. Use `random_state=315` for reproducibility." |
98 | 98 | ] |
99 | 99 | }, |
100 | 100 | { |
|
163 | 163 | "source": [ |
164 | 164 | "### 2.2. Test set evaluation\n", |
165 | 165 | "\n", |
166 | | - "For classification, we use accuracy instead of R². Use the model's `predict` method to get predictions and `score` method to get accuracy.\n", |
167 | | - "\n", |
168 | | - "**Hint:** `model.score(X, y)` returns the accuracy for classifiers." |
| 166 | + "For classification, we can use accuracy, F1 score and/or AUC-ROC (and others) instead of R². Use sklearn's [`metrics`](https://scikit-learn.org/stable/api/sklearn.metrics.html) module ." |
169 | 167 | ] |
170 | 168 | }, |
171 | 169 | { |
|
177 | 175 | "source": [ |
178 | 176 | "logistic_predictions = # YOUR CODE HERE\n", |
179 | 177 | "logistic_accuracy = # YOUR CODE HERE\n", |
180 | | - "print(f'Logistic Regression accuracy on test set: {logistic_accuracy:.4f}')" |
| 178 | + "logistic_f1 = # YOUR CODE HERE\n", |
| 179 | + "logistic_auc = # YOUR CODE HERE\n", |
| 180 | + "print(f'Logistic regression accuracy on test set: {logistic_accuracy:.4f}')\n", |
| 181 | + "print(f'Logistic regression F1 score on test set: {logistic_f1:.4f}')\n", |
| 182 | + "print(f'Logistic regression AUC-ROC score on test set: {logistic_auc:.4f}')" |
181 | 183 | ] |
182 | 184 | }, |
183 | 185 | { |
|
187 | 189 | "source": [ |
188 | 190 | "### 2.3. Performance analysis\n", |
189 | 191 | "\n", |
190 | | - "For classification, we visualize performance using a confusion matrix." |
| 192 | + "For classification, visualize performance using a confusion matrix." |
191 | 193 | ] |
192 | 194 | }, |
193 | 195 | { |
|
207 | 209 | "source": [ |
208 | 210 | "## 3. Multilayer perceptron (MLP) classifier\n", |
209 | 211 | "\n", |
210 | | - "Now let's build a neural network classifier using `MLPClassifier`.\n", |
| 212 | + "Now let's build a neural network classifier using sklearn's [`MLPClassifier`](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html).\n", |
211 | 213 | "\n", |
212 | 214 | "### 3.1. Single epoch training function\n", |
213 | 215 | "\n", |
214 | 216 | "Complete the training function below. It should:\n", |
215 | 217 | "1. Split the data into training and validation sets\n", |
216 | 218 | "2. Call `partial_fit` on the model (remember to pass `classes=[0, 1]` on the first call)\n", |
217 | | - "3. Record training and validation accuracy in the history dictionary\n", |
| 219 | + "3. Record training and validation [`log_loss`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html) (aka binary cross-entropy) in the history dictionary\n", |
218 | 220 | "\n", |
219 | 221 | "**Hint:** Use `model.partial_fit(X, y, classes=[0, 1])` for the first epoch. For subsequent epochs, `partial_fit` remembers the classes." |
220 | 222 | ] |
|
228 | 230 | "source": [ |
229 | 231 | "def train(model: MLPClassifier, df: pd.DataFrame, training_history: dict, classes: list = None) -> tuple[MLPClassifier, dict]:\n", |
230 | 232 | " '''Trains sklearn MLP classifier model on given dataframe using validation split.\n", |
231 | | - " Returns the updated model and training history dictionary.'''\n", |
| 233 | + " Returns the updated model and training history dictionary containing training and\n", |
| 234 | + " validation log loss. If classes are not provided, assumes 0 and 1.'''\n", |
| 235 | + "\n", |
| 236 | + " global features, label\n", |
232 | 237 | "\n", |
233 | 238 | " df, val_df = train_test_split(df, random_state=315)\n", |
234 | 239 | " \n", |
235 | 240 | " # YOUR CODE HERE: call partial_fit on the model\n", |
236 | 241 | " # If classes is provided, pass it to partial_fit\n", |
237 | 242 | " \n", |
238 | | - " # YOUR CODE HERE: append training and validation accuracy to history\n", |
| 243 | + " # YOUR CODE HERE: append training and validation log loss to history\n", |
239 | 244 | " \n", |
240 | 245 | " return model, training_history" |
241 | 246 | ] |
|
267 | 272 | "epochs = 10\n", |
268 | 273 | "\n", |
269 | 274 | "training_history = {\n", |
270 | | - " 'training_accuracy': [],\n", |
271 | | - " 'validation_accuracy': []\n", |
| 275 | + " 'training_loss': [],\n", |
| 276 | + " 'validation_loss': []\n", |
272 | 277 | "}\n", |
273 | 278 | "\n", |
274 | 279 | "mlp_model = # YOUR CODE HERE: create MLPClassifier\n", |
|
285 | 290 | "source": [ |
286 | 291 | "### 3.3. Learning curves\n", |
287 | 292 | "\n", |
288 | | - "Plot the training and validation accuracy over epochs to visualize the learning process." |
| 293 | + "Plot the training and validation loss over epochs to visualize the learning process." |
289 | 294 | ] |
290 | 295 | }, |
291 | 296 | { |
|
295 | 300 | "metadata": {}, |
296 | 301 | "outputs": [], |
297 | 302 | "source": [ |
298 | | - "# YOUR CODE HERE: plot training and validation accuracy\n", |
| 303 | + "# YOUR CODE HERE: plot training and validation loss\n", |
299 | 304 | "# Use plt.plot() for each curve\n", |
300 | 305 | "# Add title, xlabel, ylabel, and legend" |
301 | 306 | ] |
|
319 | 324 | "source": [ |
320 | 325 | "mlp_predictions = # YOUR CODE HERE\n", |
321 | 326 | "mlp_accuracy = # YOUR CODE HERE\n", |
322 | | - "print(f'MLP accuracy on test set: {mlp_accuracy:.4f}')" |
| 327 | + "mlp_f1 = # YOUR CODE HERE\n", |
| 328 | + "mlp_auc = # YOUR CODE HERE\n", |
| 329 | + "print(f'MLP accuracy on test set: {mlp_accuracy:.4f}')\n", |
| 330 | + "print(f'MLP F1 score on test set: {mlp_accuracy:.4f}')\n", |
| 331 | + "print(f'MLP AUC-ROC score on test set: {mlp_accuracy:.4f}')" |
323 | 332 | ] |
324 | 333 | }, |
325 | 334 | { |
|
0 commit comments