Skip to content

Commit a5cc27e

Browse files
author
M.Notter
committed
Adds ML in python blog posts
1 parent 0d25db0 commit a5cc27e

17 files changed

+2041
-0
lines changed

_posts/2023-12-15-01_scikit_simple.md

Lines changed: 421 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 342 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,342 @@
1+
---
2+
layout: post
3+
title: ML in Python Part 2 - Neural Networks with TensorFlow
4+
date: 2023-10-23 13:00:00
5+
description: Building your first neural network for image classification
6+
7+
---
8+
9+
In this second part of our machine learning series, we'll implement the same MNIST classification task using [TensorFlow](https://www.tensorflow.org/). While Scikit-learn excels at classical machine learning, TensorFlow shines when building neural networks. We'll see how deep learning approaches differ from traditional methods and learn the basic concepts of neural network architecture.
10+
11+
## Why Neural Networks?
12+
13+
While our Scikit-learn models performed well in Part 1, neural networks offer several key advantages for image classification:
14+
- **Automatic feature learning**: No need to manually engineer features
15+
- **Scalability**: Can handle much larger datasets efficiently
16+
- **Complex pattern recognition**: Especially good at finding hierarchical patterns in data
17+
- **State-of-the-art performance**: Currently the best approach for many computer vision tasks
18+
19+
Let's see these advantages in action by building our own neural network for digit classification.
20+
21+
Let's start by importing the necessary packages:
22+
23+
```python
24+
import numpy as np
25+
import pandas as pd
26+
import matplotlib.pyplot as plt
27+
import seaborn as sns
28+
29+
import tensorflow as tf
30+
from tensorflow import keras
31+
from tensorflow.keras import layers
32+
```
33+
34+
## 1. Load and Prepare Dataset
35+
36+
Unlike Scikit-learn, TensorFlow's MNIST dataset comes in a slightly different format. We'll keep the images in their original 2D shape (28x28 pixels) since neural networks can work directly with this structure - another advantage over traditional methods.
37+
38+
```python
39+
# Model and data parameters
40+
num_classes = 10 # One class for each digit (0-9)
41+
input_shape = (28, 28, 1) # Height, width, and channels (1 for grayscale)
42+
43+
# Load dataset, already pre-split into train and test set
44+
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
45+
46+
# Scale pixel values to range [0,1] - this helps with training stability
47+
x_train = x_train.astype('float32') / 255.0
48+
x_test = x_test.astype('float32') / 255.0
49+
50+
# Add channel dimension required by Conv2D layers
51+
x_train = np.expand_dims(x_train, -1)
52+
x_test = np.expand_dims(x_test, -1)
53+
54+
print("x_train shape:", x_train.shape)
55+
print("x_test shape:", x_test.shape)
56+
```
57+
58+
x_train shape: (60000, 28, 28, 1)
59+
x_test shape: (10000, 28, 28, 1)
60+
61+
The final dimension (1) represents the color channel. Since MNIST contains grayscale images, we only need one channel, unlike RGB images which would have 3 channels.
62+
63+
Now that the data is loaded and scaled to appropriate range, we can go ahead and create the neural network
64+
model. Given that our input are images, let's go ahead and train a convolutional neural network. There are
65+
multiple ways how we can set this up.
66+
67+
## 2. Create Neural Network Model
68+
69+
For image classification, we'll use a Convolutional Neural Network (CNN). CNNs are specifically designed to work with image data through specialized layers:
70+
71+
- **Convolutional layers**: Detect patterns like edges, textures, and shapes
72+
- **Pooling layers**: Reduce dimensionality while preserving important features
73+
- **Dense layers**: Combine detected features for final classification
74+
- **Dropout layers**: Prevent overfitting by randomly deactivating neurons
75+
76+
There are multiple ways to define a model in TensorFlow. We'll start with the most straightforward approach, which is a sequential model:
77+
78+
```python
79+
# Compact and sequential
80+
model = keras.Sequential(
81+
[
82+
layers.Conv2D(32, kernel_size=(3, 3), activation='relu',
83+
input_shape=input_shape),
84+
layers.MaxPooling2D(pool_size=(2, 2)),
85+
layers.Conv2D(64, kernel_size=(3, 3), activation='relu'),
86+
layers.MaxPooling2D(pool_size=(2, 2)),
87+
layers.Flatten(),
88+
layers.Dropout(0.5),
89+
layers.Dense(32, activation='relu'),
90+
layers.Dropout(0.5),
91+
layers.Dense(num_classes, activation='softmax'),
92+
]
93+
)
94+
```
95+
96+
To have a bit more control about the individual steps, we can also separate each individual part, and define
97+
the network architecture as follows.
98+
99+
```python
100+
# More precise and sequential
101+
model = keras.Sequential(
102+
[
103+
keras.Input(shape=input_shape),
104+
layers.Conv2D(32, kernel_size=(3, 3)),
105+
layers.ReLU(),
106+
layers.MaxPooling2D(pool_size=(2, 2)),
107+
layers.Conv2D(64, kernel_size=(3, 3)),
108+
layers.ReLU(),
109+
layers.MaxPooling2D(pool_size=(2, 2)),
110+
layers.Flatten(),
111+
layers.Dropout(0.5),
112+
layers.Dense(32),
113+
layers.ReLU(),
114+
layers.Dropout(0.5),
115+
layers.Dense(num_classes),
116+
layers.Softmax(),
117+
]
118+
)
119+
```
120+
121+
The two models are functionally identical, but this second version:
122+
- Allows finer control over layer placement
123+
- Makes it easier to insert additional layers like BatchNormalization
124+
- Provides more explicit activation functions
125+
- Makes the data flow more transparent
126+
127+
Next to this sequential API, there's also a functional one. We will cover that in the later, more advanced, TensorFlow example.
128+
129+
Once the model is created, you can use the `summary()` method to get an overview of the network's architecture
130+
and the number of trainable and non-trainable parameters.
131+
132+
```python
133+
model.summary()
134+
```
135+
136+
Model: "sequential_1"
137+
_________________________________________________________________
138+
Layer (type) Output Shape Param #
139+
=================================================================
140+
conv2d_2 (Conv2D) (None, 26, 26, 32) 320
141+
re_lu (ReLU) (None, 26, 26, 32) 0
142+
max_pooling2d_2 (MaxPooling (None, 13, 13, 32) 0
143+
2D)
144+
conv2d_3 (Conv2D) (None, 11, 11, 64) 18496
145+
re_lu_1 (ReLU) (None, 11, 11, 64) 0
146+
max_pooling2d_3 (MaxPooling (None, 5, 5, 64) 0
147+
2D)
148+
flatten_1 (Flatten) (None, 1600) 0
149+
dropout_2 (Dropout) (None, 1600) 0
150+
dense_2 (Dense) (None, 32) 51232
151+
re_lu_2 (ReLU) (None, 32) 0
152+
dropout_3 (Dropout) (None, 32) 0
153+
dense_3 (Dense) (None, 10) 330
154+
softmax (Softmax) (None, 10) 0
155+
=================================================================
156+
Total params: 70,378
157+
Trainable params: 70,378
158+
Non-trainable params: 0
159+
_________________________________________________________________
160+
161+
This summary tells us several important things:
162+
1. Our model has 70,378 trainable parameters - relatively small by modern standards
163+
2. The input image (28x28x1) is progressively reduced in size through pooling (see the Output Shape column)
164+
3. The final dense layer has 10 outputs - one for each digit class
165+
4. Most parameters are in the dense layers, not the convolutional layers
166+
167+
## 3. Train TensorFlow model
168+
169+
Before we can train the model we need to provide a few additional information:
170+
171+
- `batch_size`: How many samples the model should look at once before performing the gradient descent.
172+
- `epochs`: For how many times the model should go through the full dataset.
173+
- `loss`: Which loss function the model should optimize for.
174+
- `metrics`: Which performance metrics the model should keep track of. By default this includes the loss metric.
175+
- `optimizer`: Which optimizer strategy the model should use. This could involve additional optimzation
176+
parameters, such as the learning rate.
177+
- `validation_split` or `validation_data`: This parameter allows you to automatically split the training set
178+
into a training and validation set (with `validation_split`) or you can also provide a specific validation
179+
set with `validation_data`.
180+
181+
Finding the right parameters for any of that, as well as establishing the right model architecture, is the
182+
black arts of any deep learning practisioners. For this example, let's just go with some proven default
183+
parameters.
184+
185+
```python
186+
# Model parameters
187+
batch_size = 128
188+
epochs = 10
189+
190+
# Compile model with appropriate metrics and optimizers
191+
model.compile(
192+
loss='sparse_categorical_crossentropy',
193+
optimizer='adam',
194+
metrics=['accuracy']
195+
)
196+
```
197+
198+
Now everything is ready that we can train our model.
199+
200+
```python
201+
# Model training
202+
history = model.fit(
203+
x_train,
204+
y_train,
205+
batch_size=batch_size,
206+
epochs=epochs,
207+
validation_split=0.1
208+
)
209+
```
210+
211+
Epoch 1/10
212+
422/422 [==============================] - 4s 9ms/step - loss: 0.5902 - accuracy: 0.8117 - val_loss: 0.1014 - val_accuracy: 0.9700
213+
Epoch 2/10
214+
422/422 [==============================] - 4s 8ms/step - loss: 0.2183 - accuracy: 0.9364 - val_loss: 0.0674 - val_accuracy: 0.9808
215+
Epoch 3/10
216+
422/422 [==============================] - 4s 8ms/step - loss: 0.1663 - accuracy: 0.9512 - val_loss: 0.0499 - val_accuracy: 0.9860
217+
Epoch 4/10
218+
422/422 [==============================] - 4s 8ms/step - loss: 0.1390 - accuracy: 0.9599 - val_loss: 0.0462 - val_accuracy: 0.9875
219+
Epoch 5/10
220+
422/422 [==============================] - 4s 8ms/step - loss: 0.1166 - accuracy: 0.9674 - val_loss: 0.0433 - val_accuracy: 0.9888
221+
Epoch 6/10
222+
422/422 [==============================] - 4s 8ms/step - loss: 0.1046 - accuracy: 0.9693 - val_loss: 0.0370 - val_accuracy: 0.9902
223+
Epoch 7/10
224+
422/422 [==============================] - 4s 8ms/step - loss: 0.0950 - accuracy: 0.9722 - val_loss: 0.0394 - val_accuracy: 0.9892
225+
Epoch 8/10
226+
422/422 [==============================] - 4s 8ms/step - loss: 0.0891 - accuracy: 0.9742 - val_loss: 0.0400 - val_accuracy: 0.9895
227+
Epoch 9/10
228+
422/422 [==============================] - 4s 8ms/step - loss: 0.0865 - accuracy: 0.9750 - val_loss: 0.0342 - val_accuracy: 0.9907
229+
Epoch 10/10
230+
422/422 [==============================] - 4s 8ms/step - loss: 0.0775 - accuracy: 0.9773 - val_loss: 0.0355 - val_accuracy: 0.9905
231+
232+
## 4. Model investigation
233+
234+
If we stored the `model.fit()` output in a `history` variable, we can easily access and visualize the different
235+
model metrics during training.
236+
237+
```python
238+
# Store history in a dataframe
239+
df_history = pd.DataFrame(history.history)
240+
241+
# Visualize training history
242+
fig, axs = plt.subplots(1, 2, figsize=(15, 4))
243+
df_history.iloc[:, df_history.columns.str.contains('loss')].plot(
244+
title="Loss during training", ax=axs[0])
245+
df_history.iloc[:, df_history.columns.str.contains('accuracy')].plot(
246+
title="Accuracy during training", ax=axs[1])
247+
axs[0].set_xlabel("Epoch [#]")
248+
axs[1].set_xlabel("Epoch [#]")
249+
axs[0].set_ylabel("Loss")
250+
axs[1].set_ylabel("Accuracy")
251+
plt.show()
252+
```
253+
254+
<img class="img-fluid rounded z-depth-1" src="{{ site.baseurl }}/assets/ex_plots/ex_03_tensorflow_simple_output_16_0.png" data-zoomable width=800px style="padding-top: 20px; padding-right: 20px; padding-bottom: 20px; padding-left: 20px">
255+
256+
Once the model is trained we can also compute its score on the test set. For this we can use the `evaluate()`
257+
method.
258+
259+
```python
260+
score = model.evaluate(x_test, y_test, verbose=0)
261+
print(f"Test loss: {score[0]:.3f}")
262+
print(f"Test accuracy: {score[1]*100:.2f}%")
263+
```
264+
265+
Test loss: 0.032
266+
Test accuracy: 98.93%
267+
268+
And if you're interested in the individual predictions, you can use the `predict()` method.
269+
270+
```python
271+
y_pred = model.predict(x_test, verbose=0)
272+
y_pred.shape
273+
```
274+
275+
(10000, 10)
276+
277+
Given that our last layer uses a softmax activation, we actually don't get just the class label back, but the
278+
probability score for each class. To get to the class prediction, we therefore need to apply an argmax routine.
279+
280+
```python
281+
# Transform class probabilities to prediction labels
282+
predictions = np.argmax(y_pred, 1)
283+
284+
# Create confusion matrix
285+
cm = tf.math.confusion_matrix(y_test, predictions)
286+
287+
# Visualize confusion matrix
288+
plt.figure(figsize=(6, 6))
289+
sns.heatmap(cm, square=True, annot=True, fmt='d', cbar=False)
290+
plt.title("Confusion matrix")
291+
plt.show()
292+
```
293+
294+
<img class="img-fluid rounded z-depth-1" src="{{ site.baseurl }}/assets/ex_plots/ex_03_tensorflow_simple_output_22_0.png" data-zoomable width=800px style="padding-top: 20px; padding-right: 20px; padding-bottom: 20px; padding-left: 20px">
295+
296+
## 5. Model parameters
297+
298+
And if you're interested in the model parameters of the trained neural network, you can directly access them
299+
via `model.layers`. One advantage of neural networks is their ability to learn hierarchical features. Let's examine what our first convolutional layer learned:
300+
301+
```python
302+
# Extract first hidden convolutional layers
303+
conv_layer = model.layers[0]
304+
305+
# Transform the layer weights to a numpy array
306+
weights = conv_layer.weights[0].numpy()
307+
308+
# Visualize the 32 kernels from the first convolutional layer
309+
fig, axs = plt.subplots(4, 8, figsize=(10, 5))
310+
axs = np.ravel(axs)
311+
312+
for idx, ax in enumerate(axs):
313+
ax.set_title(f"Kernel {idx}")
314+
ax.imshow(weights[..., idx], cmap='binary')
315+
ax.axis('off')
316+
plt.tight_layout()
317+
plt.show()
318+
```
319+
320+
<img class="img-fluid rounded z-depth-1" src="{{ site.baseurl }}/assets/ex_plots/ex_03_tensorflow_simple_output_24_0.png" data-zoomable width=800px style="padding-top: 20px; padding-right: 20px; padding-bottom: 20px; padding-left: 20px">
321+
322+
## Summary and Next Steps
323+
324+
In this tutorial, we've introduced neural networks using TensorFlow:
325+
- Building a CNN architecture
326+
- Training with backpropagation
327+
- Monitoring learning progress
328+
- Visualizing learned features
329+
330+
Our neural network achieved comparable accuracy to our Scikit-learn models (~99%), but this time on images with a higher resoltuion with the potential for even better performance through further optimization.
331+
332+
Key takeaways:
333+
1. Neural networks can work directly with structured data like images
334+
2. Architecture design is crucial for good performance
335+
3. Training requires careful parameter selection
336+
4. Monitoring training helps detect problems early
337+
5. Visualizing learned features provides insights into model behavior
338+
339+
In Part 3, we'll explore more advanced machine learning concepts using Scikit-learn, focusing on regression problems and complex preprocessing pipelines.
340+
341+
[← Back to Part 1: Getting Started with Scikit-learn]({{ site.baseurl }}/blog/2023/01_scikit_simple)
342+
[Continue to Part 3: Advanced Machine Learning with Scikit-learn →]({{ site.baseurl }}/blog/2023/03_scikit_advanced)

0 commit comments

Comments
 (0)