Skip to content

Commit 17696fa

Browse files
author
M.Notter
committed
Updates 4 blog posts
1 parent 5d2fe5f commit 17696fa

File tree

6 files changed

+950
-136
lines changed

6 files changed

+950
-136
lines changed

_posts/2023-10-23-01_scikit_simple.md

Lines changed: 202 additions & 40 deletions
Large diffs are not rendered by default.

_posts/2023-10-23-02_tensorflow_simple.md

Lines changed: 126 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
layout: post
3-
title: ML in Python Part 2 - Neural Networks with TensorFlow
3+
title: Deep Learning Fundamentals - Building Neural Networks with TensorFlow
44
date: 2023-10-23 13:00:00
55
description: Building your first neural network for image classification
66

@@ -36,7 +36,7 @@ from tensorflow.keras import layers
3636
Unlike Scikit-learn, TensorFlow's MNIST dataset comes in a slightly different format. We'll keep the images in their original 2D shape (28x28 pixels) since neural networks can work directly with this structure - another advantage over traditional methods.
3737

3838
```python
39-
# Model and data parameters
39+
# Model parameters
4040
num_classes = 10 # One class for each digit (0-9)
4141
input_shape = (28, 28, 1) # Height, width, and channels (1 for grayscale)
4242

@@ -48,6 +48,7 @@ x_train = x_train.astype('float32') / 255.0
4848
x_test = x_test.astype('float32') / 255.0
4949

5050
# Add channel dimension required by Conv2D layers
51+
# Shape changes from (samples, height, width) to (samples, height, width, channels)
5152
x_train = np.expand_dims(x_train, -1)
5253
x_test = np.expand_dims(x_test, -1)
5354

@@ -58,6 +59,12 @@ print("x_test shape:", x_test.shape)
5859
x_train shape: (60000, 28, 28, 1)
5960
x_test shape: (10000, 28, 28, 1)
6061

62+
Our dataset dimensions represent:
63+
- **60,000 training samples**: Much larger than scikit-learn's version for better learning
64+
- **28x28 pixels**: Higher resolution images than Part 1's 8x8 grid
65+
- **1 channel**: Grayscale images (RGB would be 3 channels)
66+
- **10,000 test samples**: Large test set for robust evaluation
67+
6168
The final dimension (1) represents the color channel. Since MNIST contains grayscale images, we only need one channel, unlike RGB images which would have 3 channels.
6269

6370
Now that the data is loaded and scaled to appropriate range, we can go ahead and create the neural network
@@ -68,36 +75,47 @@ multiple ways how we can set this up.
6875

6976
For image classification, we'll use a Convolutional Neural Network (CNN). CNNs are specifically designed to work with image data through specialized layers:
7077

71-
- **Convolutional layers**: Detect patterns like edges, textures, and shapes
72-
- **Pooling layers**: Reduce dimensionality while preserving important features
73-
- **Dense layers**: Combine detected features for final classification
74-
- **Dropout layers**: Prevent overfitting by randomly deactivating neurons
78+
- **Convolutional layers**: Extract spatial features like edges, textures, and shapes
79+
- **Pooling layers**: Reduce spatial dimensions while preserving important features
80+
- **Dense layers**: Combine extracted features for final classification
81+
- **Dropout layers**: Prevent overfitting by randomly deactivating neurons during training
82+
83+
There are multiple ways to define a model in TensorFlow. Let's explore two common approaches:
7584

76-
There are multiple ways to define a model in TensorFlow. We'll start with the most straightforward approach, which is a sequential model:
85+
### 1. Sequential API
86+
The Sequential API is the simplest way to build neural networks - layers are stacked linearly, one after another:
7787

7888
```python
79-
# Compact and sequential
89+
# Define model architecture using Sequential API
8090
model = keras.Sequential(
8191
[
82-
layers.Conv2D(32, kernel_size=(3, 3), activation='relu',
92+
# First Convolutional Block
93+
layers.Conv2D(32, kernel_size=(3, 3), activation='relu', # 32 filters, each 3x3 in size, detect basic patterns
8394
input_shape=input_shape),
95+
layers.MaxPooling2D(pool_size=(2, 2)), # Reduces spatial dimensions by half while preserving features
96+
97+
# Second Convolutional Block
98+
layers.Conv2D(64, kernel_size=(3, 3), activation='relu'), # 64 filters detect more complex patterns
8499
layers.MaxPooling2D(pool_size=(2, 2)),
85-
layers.Conv2D(64, kernel_size=(3, 3), activation='relu'),
86-
layers.MaxPooling2D(pool_size=(2, 2)),
100+
101+
# Flatten 3D feature maps to 1D feature vector
87102
layers.Flatten(),
88-
layers.Dropout(0.5),
89-
layers.Dense(32, activation='relu'),
90-
layers.Dropout(0.5),
103+
104+
# Dense layers for final classification
105+
layers.Dropout(0.5), # Prevents overfitting by randomly dropping 50% of connections
106+
layers.Dense(32, activation='relu'), # Hidden layer combines features
107+
108+
# Output layer for classification
91109
layers.Dense(num_classes, activation='softmax'),
92110
]
93111
)
94112
```
95113

96-
To have a bit more control about the individual steps, we can also separate each individual part, and define
97-
the network architecture as follows.
114+
### 2. Layer-by-Layer Sequential API
115+
For more explicit control, we can separate each layer and activation:
98116

99117
```python
100-
# More precise and sequential
118+
# More precise and sequential approach
101119
model = keras.Sequential(
102120
[
103121
keras.Input(shape=input_shape),
@@ -118,13 +136,17 @@ model = keras.Sequential(
118136
)
119137
```
120138

121-
The two models are functionally identical, but this second version:
122-
- Allows finer control over layer placement
139+
The two models are functionally identical, but the layer-by-layer approach offers several advantages:
123140
- Makes it easier to insert additional layers like BatchNormalization
124141
- Provides more explicit activation functions
125142
- Makes the data flow more transparent
143+
- Allows finer control over layer parameters
126144

127-
Next to this sequential API, there's also a functional one. We will cover that in the later, more advanced, TensorFlow example.
145+
Next to this sequential API, there's also a functional API.We'll explore this more flexible approach in our advanced TensorFlow tutorial, which allows for:
146+
- Multiple inputs and outputs
147+
- Layer sharing
148+
- Non-sequential layer connections
149+
- Complex architectures like residual networks
128150

129151
Once the model is created, you can use the `summary()` method to get an overview of the network's architecture
130152
and the number of trainable and non-trainable parameters.
@@ -183,28 +205,24 @@ black arts of any deep learning practitioners. For this example, let's just go w
183205
parameters.
184206

185207
```python
186-
# Model parameters
187-
batch_size = 128
188-
epochs = 10
208+
# Training configuration
209+
batch_size = 128 # Number of samples processed before model update
210+
epochs = 10 # Number of complete passes through the dataset
189211

190-
# Compile model with appropriate metrics and optimizers
212+
# Compile model with appropriate loss function and optimizer
191213
model.compile(
192-
loss='sparse_categorical_crossentropy',
193-
optimizer='adam',
194-
metrics=['accuracy']
214+
loss='sparse_categorical_crossentropy', # Appropriate for integer labels
215+
optimizer='adam', # Adaptive learning rate optimizer
216+
metrics=['accuracy'] # Track accuracy during training
195217
)
196-
```
197218

198-
Now everything is ready that we can train our model.
199-
200-
```python
201-
# Model training
219+
# Train the model
202220
history = model.fit(
203221
x_train,
204222
y_train,
205223
batch_size=batch_size,
206224
epochs=epochs,
207-
validation_split=0.1
225+
validation_split=0.1 # Use 10% of training data for validation
208226
)
209227
```
210228

@@ -229,6 +247,23 @@ history = model.fit(
229247
Epoch 10/10
230248
422/422 [==============================] - 4s 8ms/step - loss: 0.0775 - accuracy: 0.9773 - val_loss: 0.0355 - val_accuracy: 0.9905
231249

250+
Let's analyze the training progression:
251+
- **Initial Performance (Epoch 1)**:
252+
- Training: 81.17% accuracy, loss of 0.5902
253+
- Validation: 97.00% accuracy, loss of 0.1014
254+
- Shows rapid initial learning
255+
256+
- **Final Performance (Epoch 10)**:
257+
- Training: 97.73% accuracy, loss of 0.0775
258+
- Validation: 99.05% accuracy, loss of 0.0355
259+
- Excellent convergence with validation outperforming training
260+
261+
- **Key Observations**:
262+
- Consistent improvement across epochs
263+
- Lower validation loss than training loss suggests good generalization
264+
- Final accuracy exceeds our Scikit-learn model from Part 1
265+
- No signs of overfitting as validation metrics remain stable
266+
232267
## 4. Model investigation
233268

234269
If we stored the `model.fit()` output in a `history` variable, we can easily access and visualize the different
@@ -252,6 +287,9 @@ plt.show()
252287
```
253288

254289
<img class="img-fluid rounded z-depth-1" src="{{ site.baseurl }}/assets/ex_plots/ex_03_tensorflow_simple_output_16_0.png" data-zoomable width=800px style="padding-top: 20px; padding-right: 20px; padding-bottom: 20px; padding-left: 20px">
290+
<div class="caption">
291+
Figure 1: Training metrics over time showing model loss (left) and Mean Absolute Error (right) for both training and validation sets. The logarithmic scale helps visualize improvement across different magnitudes.
292+
</div>
255293

256294
Once the model is trained we can also compute its score on the test set. For this we can use the `evaluate()`
257295
method.
@@ -321,6 +359,58 @@ plt.show()
321359

322360
<img class="img-fluid rounded z-depth-1" src="{{ site.baseurl }}/assets/ex_plots/ex_03_tensorflow_simple_output_24_0.png" data-zoomable width=800px style="padding-top: 20px; padding-right: 20px; padding-bottom: 20px; padding-left: 20px">
323361

362+
### Common Deep Learning Pitfalls
363+
When starting with TensorFlow and neural networks, watch out for these common issues:
364+
365+
1. **Data Preparation**
366+
- (Almost) always scale input data (like we did with `/255.0`)
367+
- Check for missing or invalid values
368+
- Ensure consistent data types
369+
```python
370+
# Example of proper data preparation
371+
x_train = x_train.astype('float32') / 255.0
372+
x_test = x_test.astype('float32') / 255.0
373+
```
374+
375+
2. **Model Architecture**
376+
- Start simple, add complexity only if needed
377+
- Match output layer to your task (softmax for classification)
378+
- Use appropriate layer sizes
379+
```python
380+
# Example of clear, progressive architecture
381+
model = keras.Sequential([
382+
layers.Conv2D(32, kernel_size=(3, 3), activation='relu'),
383+
layers.MaxPooling2D(pool_size=(2, 2)),
384+
layers.Flatten(),
385+
layers.Dense(10, activation='softmax') # 10 classes
386+
])
387+
```
388+
389+
3. **Training Issues**
390+
- Monitor training metrics (loss not decreasing)
391+
- Watch for overfitting (validation loss increasing)
392+
- Use appropriate batch sizes
393+
```python
394+
# Add validation monitoring during training
395+
history = model.fit(
396+
x_train, y_train,
397+
validation_split=0.1,
398+
batch_size=128,
399+
epochs=10
400+
)
401+
```
402+
403+
4. **Memory Management**
404+
- Clear unnecessary variables
405+
- Use appropriate data types
406+
- Watch batch sizes on limited hardware
407+
```python
408+
# Free memory after training
409+
import gc
410+
gc.collect()
411+
keras.backend.clear_session()
412+
```
413+
324414
## Summary and Next Steps
325415

326416
In this tutorial, we've introduced neural networks using TensorFlow:
@@ -329,7 +419,7 @@ In this tutorial, we've introduced neural networks using TensorFlow:
329419
- Monitoring learning progress
330420
- Visualizing learned features
331421

332-
Our neural network achieved comparable accuracy to our Scikit-learn models (~99%), but this time on images with a higher resoltuion with the potential for even better performance through further optimization.
422+
Our neural network achieved comparable accuracy to our Scikit-learn models (~99%), but this time on images with a higher resolution with the potential for even better performance through further optimization.
333423

334424
**Key takeaways:**
335425
1. Neural networks can work directly with structured data like images
@@ -340,4 +430,5 @@ Our neural network achieved comparable accuracy to our Scikit-learn models (~99%
340430

341431
In Part 3, we'll explore more advanced machine learning concepts using Scikit-learn, focusing on regression problems and complex preprocessing pipelines.
342432

343-
[← Back to Part 1]({{ site.baseurl }}/blog/2023/01_scikit_simple) or [Continue to Part 3 →]({{ site.baseurl }}/blog/2023/03_scikit_advanced)
433+
[← Previous: Getting Started]({{ site.baseurl }}/blog/2023/01_scikit_simple) or
434+
[Next: Advanced Machine Learning →]({{ site.baseurl }}/blog/2023/03_scikit_advanced)

0 commit comments

Comments
 (0)