|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: ML in Python Part 2 - Neural Networks with TensorFlow |
| 4 | +date: 2023-10-23 13:00:00 |
| 5 | +description: Building your first neural network for image classification |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +In this second part of our machine learning series, we'll implement the same MNIST classification task using [TensorFlow](https://www.tensorflow.org/). While Scikit-learn excels at classical machine learning, TensorFlow shines when building neural networks. We'll see how deep learning approaches differ from traditional methods and learn the basic concepts of neural network architecture. |
| 10 | + |
| 11 | +## Why Neural Networks? |
| 12 | + |
| 13 | +While our Scikit-learn models performed well in Part 1, neural networks offer several key advantages for image classification: |
| 14 | +- **Automatic feature learning**: No need to manually engineer features |
| 15 | +- **Scalability**: Can handle much larger datasets efficiently |
| 16 | +- **Complex pattern recognition**: Especially good at finding hierarchical patterns in data |
| 17 | +- **State-of-the-art performance**: Currently the best approach for many computer vision tasks |
| 18 | + |
| 19 | +Let's see these advantages in action by building our own neural network for digit classification. |
| 20 | + |
| 21 | +Let's start by importing the necessary packages: |
| 22 | + |
| 23 | +```python |
| 24 | +import numpy as np |
| 25 | +import pandas as pd |
| 26 | +import matplotlib.pyplot as plt |
| 27 | +import seaborn as sns |
| 28 | + |
| 29 | +import tensorflow as tf |
| 30 | +from tensorflow import keras |
| 31 | +from tensorflow.keras import layers |
| 32 | +``` |
| 33 | + |
| 34 | +## 1. Load and Prepare Dataset |
| 35 | + |
| 36 | +Unlike Scikit-learn, TensorFlow's MNIST dataset comes in a slightly different format. We'll keep the images in their original 2D shape (28x28 pixels) since neural networks can work directly with this structure - another advantage over traditional methods. |
| 37 | + |
| 38 | +```python |
| 39 | +# Model and data parameters |
| 40 | +num_classes = 10 # One class for each digit (0-9) |
| 41 | +input_shape = (28, 28, 1) # Height, width, and channels (1 for grayscale) |
| 42 | + |
| 43 | +# Load dataset, already pre-split into train and test set |
| 44 | +(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data() |
| 45 | + |
| 46 | +# Scale pixel values to range [0,1] - this helps with training stability |
| 47 | +x_train = x_train.astype('float32') / 255.0 |
| 48 | +x_test = x_test.astype('float32') / 255.0 |
| 49 | + |
| 50 | +# Add channel dimension required by Conv2D layers |
| 51 | +x_train = np.expand_dims(x_train, -1) |
| 52 | +x_test = np.expand_dims(x_test, -1) |
| 53 | + |
| 54 | +print("x_train shape:", x_train.shape) |
| 55 | +print("x_test shape:", x_test.shape) |
| 56 | +``` |
| 57 | + |
| 58 | + x_train shape: (60000, 28, 28, 1) |
| 59 | + x_test shape: (10000, 28, 28, 1) |
| 60 | + |
| 61 | +The final dimension (1) represents the color channel. Since MNIST contains grayscale images, we only need one channel, unlike RGB images which would have 3 channels. |
| 62 | + |
| 63 | +Now that the data is loaded and scaled to appropriate range, we can go ahead and create the neural network |
| 64 | +model. Given that our input are images, let's go ahead and train a convolutional neural network. There are |
| 65 | +multiple ways how we can set this up. |
| 66 | + |
| 67 | +## 2. Create Neural Network Model |
| 68 | + |
| 69 | +For image classification, we'll use a Convolutional Neural Network (CNN). CNNs are specifically designed to work with image data through specialized layers: |
| 70 | + |
| 71 | +- **Convolutional layers**: Detect patterns like edges, textures, and shapes |
| 72 | +- **Pooling layers**: Reduce dimensionality while preserving important features |
| 73 | +- **Dense layers**: Combine detected features for final classification |
| 74 | +- **Dropout layers**: Prevent overfitting by randomly deactivating neurons |
| 75 | + |
| 76 | +There are multiple ways to define a model in TensorFlow. We'll start with the most straightforward approach, which is a sequential model: |
| 77 | + |
| 78 | +```python |
| 79 | +# Compact and sequential |
| 80 | +model = keras.Sequential( |
| 81 | + [ |
| 82 | + layers.Conv2D(32, kernel_size=(3, 3), activation='relu', |
| 83 | + input_shape=input_shape), |
| 84 | + layers.MaxPooling2D(pool_size=(2, 2)), |
| 85 | + layers.Conv2D(64, kernel_size=(3, 3), activation='relu'), |
| 86 | + layers.MaxPooling2D(pool_size=(2, 2)), |
| 87 | + layers.Flatten(), |
| 88 | + layers.Dropout(0.5), |
| 89 | + layers.Dense(32, activation='relu'), |
| 90 | + layers.Dropout(0.5), |
| 91 | + layers.Dense(num_classes, activation='softmax'), |
| 92 | + ] |
| 93 | +) |
| 94 | +``` |
| 95 | + |
| 96 | +To have a bit more control about the individual steps, we can also separate each individual part, and define |
| 97 | +the network architecture as follows. |
| 98 | + |
| 99 | +```python |
| 100 | +# More precise and sequential |
| 101 | +model = keras.Sequential( |
| 102 | + [ |
| 103 | + keras.Input(shape=input_shape), |
| 104 | + layers.Conv2D(32, kernel_size=(3, 3)), |
| 105 | + layers.ReLU(), |
| 106 | + layers.MaxPooling2D(pool_size=(2, 2)), |
| 107 | + layers.Conv2D(64, kernel_size=(3, 3)), |
| 108 | + layers.ReLU(), |
| 109 | + layers.MaxPooling2D(pool_size=(2, 2)), |
| 110 | + layers.Flatten(), |
| 111 | + layers.Dropout(0.5), |
| 112 | + layers.Dense(32), |
| 113 | + layers.ReLU(), |
| 114 | + layers.Dropout(0.5), |
| 115 | + layers.Dense(num_classes), |
| 116 | + layers.Softmax(), |
| 117 | + ] |
| 118 | +) |
| 119 | +``` |
| 120 | + |
| 121 | +The two models are functionally identical, but this second version: |
| 122 | +- Allows finer control over layer placement |
| 123 | +- Makes it easier to insert additional layers like BatchNormalization |
| 124 | +- Provides more explicit activation functions |
| 125 | +- Makes the data flow more transparent |
| 126 | + |
| 127 | +Next to this sequential API, there's also a functional one. We will cover that in the later, more advanced, TensorFlow example. |
| 128 | + |
| 129 | +Once the model is created, you can use the `summary()` method to get an overview of the network's architecture |
| 130 | +and the number of trainable and non-trainable parameters. |
| 131 | + |
| 132 | +```python |
| 133 | +model.summary() |
| 134 | +``` |
| 135 | + |
| 136 | + Model: "sequential_1" |
| 137 | + _________________________________________________________________ |
| 138 | + Layer (type) Output Shape Param # |
| 139 | + ================================================================= |
| 140 | + conv2d_2 (Conv2D) (None, 26, 26, 32) 320 |
| 141 | + re_lu (ReLU) (None, 26, 26, 32) 0 |
| 142 | + max_pooling2d_2 (MaxPooling (None, 13, 13, 32) 0 |
| 143 | + 2D) |
| 144 | + conv2d_3 (Conv2D) (None, 11, 11, 64) 18496 |
| 145 | + re_lu_1 (ReLU) (None, 11, 11, 64) 0 |
| 146 | + max_pooling2d_3 (MaxPooling (None, 5, 5, 64) 0 |
| 147 | + 2D) |
| 148 | + flatten_1 (Flatten) (None, 1600) 0 |
| 149 | + dropout_2 (Dropout) (None, 1600) 0 |
| 150 | + dense_2 (Dense) (None, 32) 51232 |
| 151 | + re_lu_2 (ReLU) (None, 32) 0 |
| 152 | + dropout_3 (Dropout) (None, 32) 0 |
| 153 | + dense_3 (Dense) (None, 10) 330 |
| 154 | + softmax (Softmax) (None, 10) 0 |
| 155 | + ================================================================= |
| 156 | + Total params: 70,378 |
| 157 | + Trainable params: 70,378 |
| 158 | + Non-trainable params: 0 |
| 159 | + _________________________________________________________________ |
| 160 | + |
| 161 | +This summary tells us several important things: |
| 162 | +1. Our model has 70,378 trainable parameters - relatively small by modern standards |
| 163 | +2. The input image (28x28x1) is progressively reduced in size through pooling (see the Output Shape column) |
| 164 | +3. The final dense layer has 10 outputs - one for each digit class |
| 165 | +4. Most parameters are in the dense layers, not the convolutional layers |
| 166 | + |
| 167 | +## 3. Train TensorFlow model |
| 168 | + |
| 169 | +Before we can train the model we need to provide a few additional information: |
| 170 | + |
| 171 | +- `batch_size`: How many samples the model should look at once before performing the gradient descent. |
| 172 | +- `epochs`: For how many times the model should go through the full dataset. |
| 173 | +- `loss`: Which loss function the model should optimize for. |
| 174 | +- `metrics`: Which performance metrics the model should keep track of. By default this includes the loss metric. |
| 175 | +- `optimizer`: Which optimizer strategy the model should use. This could involve additional optimzation |
| 176 | + parameters, such as the learning rate. |
| 177 | +- `validation_split` or `validation_data`: This parameter allows you to automatically split the training set |
| 178 | + into a training and validation set (with `validation_split`) or you can also provide a specific validation |
| 179 | + set with `validation_data`. |
| 180 | + |
| 181 | +Finding the right parameters for any of that, as well as establishing the right model architecture, is the |
| 182 | +black arts of any deep learning practisioners. For this example, let's just go with some proven default |
| 183 | +parameters. |
| 184 | + |
| 185 | +```python |
| 186 | +# Model parameters |
| 187 | +batch_size = 128 |
| 188 | +epochs = 10 |
| 189 | + |
| 190 | +# Compile model with appropriate metrics and optimizers |
| 191 | +model.compile( |
| 192 | + loss='sparse_categorical_crossentropy', |
| 193 | + optimizer='adam', |
| 194 | + metrics=['accuracy'] |
| 195 | +) |
| 196 | +``` |
| 197 | + |
| 198 | +Now everything is ready that we can train our model. |
| 199 | + |
| 200 | +```python |
| 201 | +# Model training |
| 202 | +history = model.fit( |
| 203 | + x_train, |
| 204 | + y_train, |
| 205 | + batch_size=batch_size, |
| 206 | + epochs=epochs, |
| 207 | + validation_split=0.1 |
| 208 | +) |
| 209 | +``` |
| 210 | + |
| 211 | + Epoch 1/10 |
| 212 | + 422/422 [==============================] - 4s 9ms/step - loss: 0.5902 - accuracy: 0.8117 - val_loss: 0.1014 - val_accuracy: 0.9700 |
| 213 | + Epoch 2/10 |
| 214 | + 422/422 [==============================] - 4s 8ms/step - loss: 0.2183 - accuracy: 0.9364 - val_loss: 0.0674 - val_accuracy: 0.9808 |
| 215 | + Epoch 3/10 |
| 216 | + 422/422 [==============================] - 4s 8ms/step - loss: 0.1663 - accuracy: 0.9512 - val_loss: 0.0499 - val_accuracy: 0.9860 |
| 217 | + Epoch 4/10 |
| 218 | + 422/422 [==============================] - 4s 8ms/step - loss: 0.1390 - accuracy: 0.9599 - val_loss: 0.0462 - val_accuracy: 0.9875 |
| 219 | + Epoch 5/10 |
| 220 | + 422/422 [==============================] - 4s 8ms/step - loss: 0.1166 - accuracy: 0.9674 - val_loss: 0.0433 - val_accuracy: 0.9888 |
| 221 | + Epoch 6/10 |
| 222 | + 422/422 [==============================] - 4s 8ms/step - loss: 0.1046 - accuracy: 0.9693 - val_loss: 0.0370 - val_accuracy: 0.9902 |
| 223 | + Epoch 7/10 |
| 224 | + 422/422 [==============================] - 4s 8ms/step - loss: 0.0950 - accuracy: 0.9722 - val_loss: 0.0394 - val_accuracy: 0.9892 |
| 225 | + Epoch 8/10 |
| 226 | + 422/422 [==============================] - 4s 8ms/step - loss: 0.0891 - accuracy: 0.9742 - val_loss: 0.0400 - val_accuracy: 0.9895 |
| 227 | + Epoch 9/10 |
| 228 | + 422/422 [==============================] - 4s 8ms/step - loss: 0.0865 - accuracy: 0.9750 - val_loss: 0.0342 - val_accuracy: 0.9907 |
| 229 | + Epoch 10/10 |
| 230 | + 422/422 [==============================] - 4s 8ms/step - loss: 0.0775 - accuracy: 0.9773 - val_loss: 0.0355 - val_accuracy: 0.9905 |
| 231 | + |
| 232 | +## 4. Model investigation |
| 233 | + |
| 234 | +If we stored the `model.fit()` output in a `history` variable, we can easily access and visualize the different |
| 235 | +model metrics during training. |
| 236 | + |
| 237 | +```python |
| 238 | +# Store history in a dataframe |
| 239 | +df_history = pd.DataFrame(history.history) |
| 240 | + |
| 241 | +# Visualize training history |
| 242 | +fig, axs = plt.subplots(1, 2, figsize=(15, 4)) |
| 243 | +df_history.iloc[:, df_history.columns.str.contains('loss')].plot( |
| 244 | + title="Loss during training", ax=axs[0]) |
| 245 | +df_history.iloc[:, df_history.columns.str.contains('accuracy')].plot( |
| 246 | + title="Accuracy during training", ax=axs[1]) |
| 247 | +axs[0].set_xlabel("Epoch [#]") |
| 248 | +axs[1].set_xlabel("Epoch [#]") |
| 249 | +axs[0].set_ylabel("Loss") |
| 250 | +axs[1].set_ylabel("Accuracy") |
| 251 | +plt.show() |
| 252 | +``` |
| 253 | + |
| 254 | +<img class="img-fluid rounded z-depth-1" src="{{ site.baseurl }}/assets/ex_plots/ex_03_tensorflow_simple_output_16_0.png" data-zoomable width=800px style="padding-top: 20px; padding-right: 20px; padding-bottom: 20px; padding-left: 20px"> |
| 255 | + |
| 256 | +Once the model is trained we can also compute its score on the test set. For this we can use the `evaluate()` |
| 257 | +method. |
| 258 | + |
| 259 | +```python |
| 260 | +score = model.evaluate(x_test, y_test, verbose=0) |
| 261 | +print(f"Test loss: {score[0]:.3f}") |
| 262 | +print(f"Test accuracy: {score[1]*100:.2f}%") |
| 263 | +``` |
| 264 | + |
| 265 | + Test loss: 0.032 |
| 266 | + Test accuracy: 98.93% |
| 267 | + |
| 268 | +And if you're interested in the individual predictions, you can use the `predict()` method. |
| 269 | + |
| 270 | +```python |
| 271 | +y_pred = model.predict(x_test, verbose=0) |
| 272 | +y_pred.shape |
| 273 | +``` |
| 274 | + |
| 275 | + (10000, 10) |
| 276 | + |
| 277 | +Given that our last layer uses a softmax activation, we actually don't get just the class label back, but the |
| 278 | +probability score for each class. To get to the class prediction, we therefore need to apply an argmax routine. |
| 279 | + |
| 280 | +```python |
| 281 | +# Transform class probabilities to prediction labels |
| 282 | +predictions = np.argmax(y_pred, 1) |
| 283 | + |
| 284 | +# Create confusion matrix |
| 285 | +cm = tf.math.confusion_matrix(y_test, predictions) |
| 286 | + |
| 287 | +# Visualize confusion matrix |
| 288 | +plt.figure(figsize=(6, 6)) |
| 289 | +sns.heatmap(cm, square=True, annot=True, fmt='d', cbar=False) |
| 290 | +plt.title("Confusion matrix") |
| 291 | +plt.show() |
| 292 | +``` |
| 293 | + |
| 294 | +<img class="img-fluid rounded z-depth-1" src="{{ site.baseurl }}/assets/ex_plots/ex_03_tensorflow_simple_output_22_0.png" data-zoomable width=800px style="padding-top: 20px; padding-right: 20px; padding-bottom: 20px; padding-left: 20px"> |
| 295 | + |
| 296 | +## 5. Model parameters |
| 297 | + |
| 298 | +And if you're interested in the model parameters of the trained neural network, you can directly access them |
| 299 | +via `model.layers`. One advantage of neural networks is their ability to learn hierarchical features. Let's examine what our first convolutional layer learned: |
| 300 | + |
| 301 | +```python |
| 302 | +# Extract first hidden convolutional layers |
| 303 | +conv_layer = model.layers[0] |
| 304 | + |
| 305 | +# Transform the layer weights to a numpy array |
| 306 | +weights = conv_layer.weights[0].numpy() |
| 307 | + |
| 308 | +# Visualize the 32 kernels from the first convolutional layer |
| 309 | +fig, axs = plt.subplots(4, 8, figsize=(10, 5)) |
| 310 | +axs = np.ravel(axs) |
| 311 | + |
| 312 | +for idx, ax in enumerate(axs): |
| 313 | + ax.set_title(f"Kernel {idx}") |
| 314 | + ax.imshow(weights[..., idx], cmap='binary') |
| 315 | + ax.axis('off') |
| 316 | +plt.tight_layout() |
| 317 | +plt.show() |
| 318 | +``` |
| 319 | + |
| 320 | +<img class="img-fluid rounded z-depth-1" src="{{ site.baseurl }}/assets/ex_plots/ex_03_tensorflow_simple_output_24_0.png" data-zoomable width=800px style="padding-top: 20px; padding-right: 20px; padding-bottom: 20px; padding-left: 20px"> |
| 321 | + |
| 322 | +## Summary and Next Steps |
| 323 | + |
| 324 | +In this tutorial, we've introduced neural networks using TensorFlow: |
| 325 | +- Building a CNN architecture |
| 326 | +- Training with backpropagation |
| 327 | +- Monitoring learning progress |
| 328 | +- Visualizing learned features |
| 329 | + |
| 330 | +Our neural network achieved comparable accuracy to our Scikit-learn models (~99%), but this time on images with a higher resoltuion with the potential for even better performance through further optimization. |
| 331 | + |
| 332 | +Key takeaways: |
| 333 | +1. Neural networks can work directly with structured data like images |
| 334 | +2. Architecture design is crucial for good performance |
| 335 | +3. Training requires careful parameter selection |
| 336 | +4. Monitoring training helps detect problems early |
| 337 | +5. Visualizing learned features provides insights into model behavior |
| 338 | + |
| 339 | +In Part 3, we'll explore more advanced machine learning concepts using Scikit-learn, focusing on regression problems and complex preprocessing pipelines. |
| 340 | + |
| 341 | +[← Back to Part 1: Getting Started with Scikit-learn]({{ site.baseurl }}/blog/2023/01_scikit_simple) |
| 342 | +[Continue to Part 3: Advanced Machine Learning with Scikit-learn →]({{ site.baseurl }}/blog/2023/03_scikit_advanced) |
0 commit comments