|
14 | 14 | "cell_type": "markdown", |
15 | 15 | "metadata": {}, |
16 | 16 | "source": [ |
17 | | - "## Supervised Learning ##\n", |
| 17 | + "# Supervised Learning\n", |
18 | 18 | "Supervised learning is one of the four common branches of machine learning which is concerned with predicting future observations based on labeled data. In a typical application, we will need to choose the relevant descriptors to provide as input to our model (feature selection), select a model, and decide on an optimal training strategy.\n", |
19 | 19 | "\n", |
20 | 20 | "<b>feature selection</b>: this is the process of deciding on the inputs that will be used for training our model. In the most general case, we are seeking a model $f(\\mathbf{x})=\\mathbf{y}$, where $\\mathbf{x}$ is the set of features, and $\\mathbf{y}$ is the set of labels we are trying to predict and $f()$ is the model. The choice of x may seem obvious in many cases if we have relevant domain information (e.g., physical laws that underly target phenomena), but in many cases it will be guided by data availability and trial and error. \n", |
|
27 | 27 | "\n", |
28 | 28 | "Supervised learning is the most common branch of machine learning and it underlies many commercial uses of ML. Today we will cover two examples <b>support vector machines</b> and <b>neural networks</b>.\n", |
29 | 29 | "\n", |
30 | | - "### Support Vector Machines ###\n", |
| 30 | + "## Support Vector Machines\n", |
31 | 31 | "<b>Support vector machines</b> are a common model for supervised classification tasks. After covering the details you will also see on a variation of SVM could be used for unsupervised classification, which provides a nice bridge between the supervised and unsupervised learning lectures. \n", |
32 | 32 | "\n", |
33 | 33 | "The object SVMs is to identify the optimal boundary that separates two or more classes of objects with respect to the descrptors. That is, an SVM, like all supervised classification models attempts to solve the problem $\\mathbf{y} = f(\\mathbf{x})$ such that given a future x, we predict the correct class y to apply to the observation. In this case, SVM draws a <i>linear</i> boundary between the two classes. That is, if you plot the data classes with respect to <b>x</b>, the SVM solution is represented by the \"best\" straight line that divides them. The ability to draw such a boundary will clearly depend on your choice of <b>x</b>! \n", |
|
42 | 42 | "\n", |
43 | 43 | "4. <b>What is a support vector?</b> Since all a SVM is trying to do is find a boundary, it turns out the optimal solution only depends on the samples that are at the edge of the domain. The vectorial positions of these boundary data are the \"support vectors\" and hence the name of the model. \n", |
44 | 44 | "\n", |
45 | | - "#### Sample Datasets ####\n", |
| 45 | + "### Sample Datasets\n", |
46 | 46 | "We'll use two of the datasets from the unsupervised learning lecture for illustrating the basics of SVMs. Critically we will keep the labels here, instead of hiding them like we did for the unsupervised demonstrations. `data1` will serve as our linearly separable example, `data2` will be the multiclass example, and `data3` will serve as our non-linear example. \n" |
47 | 47 | ] |
48 | 48 | }, |
|
142 | 142 | "cell_type": "markdown", |
143 | 143 | "metadata": {}, |
144 | 144 | "source": [ |
145 | | - "#### scikit-learn example ###\n", |
| 145 | + "### scikit-learn example\n", |
146 | 146 | "The cell below demonstrates how to use the scikit-learn `SVC` object to perform linear classification:" |
147 | 147 | ] |
148 | 148 | }, |
|
198 | 198 | "cell_type": "markdown", |
199 | 199 | "metadata": {}, |
200 | 200 | "source": [ |
201 | | - "#### Multitask Example ####\n", |
| 201 | + "### Multitask Example\n", |
202 | 202 | "In scikit learn training SVMs for multitask assignment is almost identical to the two class case:" |
203 | 203 | ] |
204 | 204 | }, |
|
256 | 256 | "source": [ |
257 | 257 | "<b>Note:</b> It is messy to try and draw the boundaries so that they don't overlap. \n", |
258 | 258 | "\n", |
259 | | - "#### Kernel Trick ####\n", |
| 259 | + "### Kernel Trick\n", |
260 | 260 | "The so-called \"kernel trick\" is used for linearizing non-linear problems. In simple terms this let's you determine curved boundaries with SVMs. In principle these are very expensive to calculate, but by applying the transform virtually (via the kernel) these curved boundaries can be calculated very effectively. The results will depend on the specific Kernel that you use. \n", |
261 | 261 | "\n", |
262 | | - "#### Activity 1 - Kernel Trick ####\n", |
| 262 | + "### Activity 1 - Kernel Trick\n", |
263 | 263 | "In the cell below I have written the code for performing a linear prediction of the classification task on our \"bullseye\" dataset. It doesn't work very well (as we expect). \n", |
264 | 264 | "\n", |
265 | 265 | "1. Look up the documentation for the ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’ kernel options. \n", |
|
302 | 302 | "cell_type": "markdown", |
303 | 303 | "metadata": {}, |
304 | 304 | "source": [ |
305 | | - "### Artificial Neural Networks ###\n", |
| 305 | + "## Artificial Neural Networks\n", |
306 | 306 | "The fundamental unit in a neural network is the \"artificial neuron\". These artificial neurons are the mathematical analog of their biological counterparts. These neurons link to other neurons, and \"fire\" depending on if their inputs rise above a certain threshold. A neural network can be made up of thousands of thes neurons connected in different ways to one another. The basic parameters in these models are associated with the weights for each neural connection (i.e., a scalar that scales the connection between a given pair of neurons), and the bias of the activation function (i.e., the threshhold necessary to switch the neuron). \n", |
307 | 307 | "\n", |
308 | 308 | "In neural network models there is a direct mapping between features via the input layer and labels via the output layer. Contrast this with the case of support vector machines, where the model only fit a boundary, then predictions were based on which side of the boundary they resided. Any additional layers that are included between the input and output layers are referred to as \"hidden\" (partly because they are not easily interpretable). Models with hidden layers are called \"deep\", and their unique proficiency in mapping complex relationships has created the special term \"deep learning\". Although neural networks can in principle be arranged in any network structure, implementation is aided by some common architectures that have emerged (e.g., feed-forward and convolutional networks).\n", |
|
311 | 311 | "\n", |
312 | 312 | "Several specialized libraries have emerged to support the development of neural network based models. Here we will be using the `Keras` python API with a `tensorflow` backend.\n", |
313 | 313 | "\n", |
314 | | - "#### Accuracy vs Loss ####\n", |
| 314 | + "### Accuracy vs Loss\n", |
315 | 315 | "As objective functions become more complex, we will want to distinguish between the value of the objective function and the performance on the desired prediction task. Specifically we have the following definitions:\n", |
316 | 316 | "\n", |
317 | 317 | "<b>loss</b>: this is the value of the objective function that our training algorithm is trying to minimize.\n", |
|
320 | 320 | "\n", |
321 | 321 | "It can be counterintuitive at first, but these two aren't as tightly coupled as you might expect. For instance, even in regression we started adding regularization terms to the objective function which made it difficult to interpret. \n", |
322 | 322 | "\n", |
323 | | - "#### Problem Context ####\n", |
| 323 | + "### Problem Context\n", |
324 | 324 | "You work as a quality engineer for a company that builds pumps for industrial applications. As part of your job, you perform failure testing in order to determine if the pumps that are produced are in or out of specifications. However, this testing is a destructive process, and in an effort to increase the cost savings you wonder if there is a better method for quality control.\n", |
325 | 325 | "\n", |
326 | 326 | "At your disposal is a dataset consisting of measurements taken on 150 different pumps as part of quality control testing, as well as the result of the testing process. Is it possible to utilize machine learning to leverage this data and provide accurate failure predictions?\n", |
|
507 | 507 | "source": [ |
508 | 508 | "We expect our pump to operate within a certain set of tolerances. If the inlet diameter is too small or too large, it can adversely affect the pump capacity. Within the acceptable inlet diameter range, we can accept some fluctuation in capacity. If the maximum head (analagous to maximum pressure) is very high but the minimum thickness of the pump is too small, the pump will fail during testing.\n", |
509 | 509 | "\n", |
510 | | - "#### Activity 2 - Calculate Statistics ####\n", |
| 510 | + "### Activity 2 - Calculate Statistics\n", |
511 | 511 | "1. Use the cell below to calculate the mean and standard deviation of each feature. \n", |
512 | 512 | "2. Make plots of the diameter vs capacity and head vs thickness." |
513 | 513 | ] |
|
529 | 529 | "\n", |
530 | 530 | "<b>Q2</b>: Do the features have the same scale? If not, what should we do before building our model?\n", |
531 | 531 | "\n", |
532 | | - "#### Activity 3 - Standardize Data ####\n", |
| 532 | + "### Activity 3 - Standardize Data\n", |
533 | 533 | "In the cell below standardize the features and replot the results" |
534 | 534 | ] |
535 | 535 | }, |
|
546 | 546 | "cell_type": "markdown", |
547 | 547 | "metadata": {}, |
548 | 548 | "source": [ |
549 | | - "#### Training and Testing Split ####\n", |
| 549 | + "### Training and Testing Split\n", |
550 | 550 | "\n", |
551 | 551 | "When training a neural network, we would like to use as much data as possible so that our model can learn on as representative a dataset as possible. However, it is also vital to ensure that the model is capable of accurate predictions on data that it has not seen before, as this matches how the model will be applied in practice. \n", |
552 | 552 | "\n", |
|
567 | 567 | "cell_type": "markdown", |
568 | 568 | "metadata": {}, |
569 | 569 | "source": [ |
570 | | - "#### Single Layer Neural Network ####\n", |
| 570 | + "### Single Layer Neural Network\n", |
571 | 571 | "For this model, we will utilize a single, fully connected layer of 3 units. That is, each input parameter is fed to each of the 3 units. The units do no commnicate amongst each other. Each will use some linear combination of the inputs fed to them multiplied by a weighting factor, and will mediate their output through a nonlinear activation function. Here, because we are considering a classification task, we utilize the 'softmax' activation function. This will output a set of normalized probability values ( i.e. 0.3,0.5,0.2 ) corresponding to the probability that a given class has been observed.\n", |
572 | 572 | "\n", |
573 | 573 | "Such models are trained through a process called backpropagation, whereby the error in the network output is used to calculate derivatives of the error with respect to the layer weights. These derivatives are used to update the weights using stochastic gradient descent (SGD) in an iterative fashion to reduce the error. Because we are dealing with a classification task with three possible outputs (Pass,Rework,Fail) we utilize the categorical crossentropy loss function, which penalizes output distributions that diverge from the input distribution. We use an 'accuracy' metric to see how accurate the model is in predicting the input class. \n", |
|
640 | 640 | "<b>Q1</b>: We see the validation loss start to increase in this example. What does this mean?\n", |
641 | 641 | "<b>Q2</b>: What is the difference between loss and accuracy?\n", |
642 | 642 | "\n", |
643 | | - "#### Multi-Layer Perceptron Model ####\n", |
| 643 | + "### Multi-Layer Perceptron Model\n", |
644 | 644 | "What if we add in an additional layer between input and output? This \"hidden\" layer allows a newtork to, in principle, approximate any continuous function. By adding in more than one hidden layer, we create a 'deep' neural network, one capable of learning latent features within a dataset an performing highly nonlinear classifications/regression tasks. \n", |
645 | 645 | "\n", |
646 | | - "#### Activity 4 - Compile a Multilayer Perceptron Model ####\n", |
| 646 | + "### Activity 4 - Compile a Multilayer Perceptron Model\n", |
647 | 647 | "Based on the earlier example, try and compile a multilayer perceptron model in the cell below with the same input and output layers as below, but adding two dense hidden layers with 8 units each and a 'relu' activation function. \n", |
648 | 648 | "\n", |
649 | 649 | "I have copied the earlier example into the cell as a starting point:" |
|
0 commit comments