Traffic Sign Recognition

The goals / steps of this project are the following:

Load the data set (see below for links to the project data set)
Explore, summarize and visualize the data set
Design, train and test a model architecture
Use the model to make predictions on new images
Analyze the softmax probabilities of the new images
Summarize the results with a written report

Note: All the code is in the 'Traffic_Sign_Classifier.ipynb' notebook.
HTML version of the notebook is 'Traffic_Sign_Classifier.html'.
The data wasn't loaded to github. The data was used form here: https://s3-us-west-1.amazonaws.com/udacity-selfdrivingcar/traffic-signs-data.zip

Data Set Summary & Exploration

1. Basic summary of the dataset

I used numpy and basic python functionality to calculate summary statistics of the traffic signs data set:

The size of training set is 34,799
The size of the validation set is 4,410
The size of test set is 12,630
The shape of a traffic sign image is 32x32x3
The number of unique classes/labels in the data set is 43

2. Include an exploratory visualization of the dataset.

Here is an exploratory visualization of the data set. It is a plot showing the number of samples of each traffic sign type in the training, validation and test datasets:

It is visible that there are some classes which have more samples than others however overall distributions in the datasets look similar (whenever one class has more samples than other class in training data we can see the same relationship in validation and test datasets).

Design and Test a Model Architecture

1. Data preprocessing

Describe how you preprocessed the image data. What techniques were chosen and why did you choose these techniques? Consider including images showing the output of each preprocessing technique. Pre-processing refers to techniques such as converting to grayscale, normalization, etc. (OPTIONAL: As described in the "Stand Out Suggestions" part of the rubric, if you generated additional data for training, describe why you decided to generate additional data, how you generated the data, and provide example images of the additional data. Then describe the characteristics of the augmented training set like number of images in the set, number of images for each class, etc.)

As a first step, the image is converted to YCrCb color space. As a second step Y channel is extracted and the image is represented as gray scale image with one color channel.
Such decision was made experimentally. Originally I was just using all thress channels of RGB color space and then tried Y channel of YCrCb color space like it was done in this paper: http://yann.lecun.com/exdb/publis/pdf/sermanet-ijcnn-11.pdf and the network performance was satisfactory using Y channel. Potentially I could try other color spaces and other channels but the performance using Y channel is good enough.

Here is an example of a traffic sign image before and after grayscaling.

As a last step, I normalized the image by subtracting mean value of all pixels and dividing by standard deviation to bring all image pixel values to the same scale with 0-mean and standard deviation of 1.

Data enhancement

Additional data was generated by shifting, rotating and zooming an image.

Here is an example of an original image and transforms applied to it. Original image:

Scaled image:

Rotated image:

Shifted image:

Two scale transforms (1.1 and 0.9), two rotation transforms (15 degrees and -15 degrees), two shift transforms ((2, 2), (2, -2)) have been applied. As a result we have increased the size of the training dataset in 7 times having 243,593 training examples after data enhancement.

2. Model architecture

Final model consists of the following layers:

Layer	Description
Input	32x32x1 Grayscale image
Convolution 5x5	1x1 stride, valid padding, outputs 28x28x100
RELU
DROPOUT	0.8 keep probability
Max pooling	2x2 stride, outputs 14x14x100
Convolution 5x5	1x1 stride, valid padding, outputs 10x10x150
RELU
DROPOUT	0.8 keep probability
Convolution 3x3	1x1 stride, valid padding, outputs 8x8x200
RELU
DROPOUT	0.8 keep probability
Max pooling	2x2 stride, outputs 4x4x200
Fully connected	Input 3200 after flattening. Output 200
RELU
DROPOUT	0.8 keep probability
Fully connected	Input 200 after flattening. Output 84
RELU
DROPOUT	0.8 keep probability
Fully connected	Input 84 after flattening. Output 43

3. Training hyperparameters

Adam Optimizer is used for optimization and weight updates.
Batch Size: 100
Learning Rate: 0.001
Number of Epochs: 20 normally, 5 for model which was fed with enhanced dataset.
Dropout keep probability 0.8 for training and 1.0 for validation and testing.

4. Accuracy results and experiments description

Describe the approach taken for finding a solution and getting the validation set accuracy to be at least 0.93. Include in the discussion the results on the training, validation and test sets and where in the code these were calculated. Your approach may have been an iterative process, in which case, outline the steps you took to get to the final solution and why you chose those steps. Perhaps your solution involved an already well known implementation or architecture. In this case, discuss why you think the architecture is suitable for the current problem.

My final model results were:

training set accuracy of 99.8%
validation set accuracy of 99.0%
test set accuracy of 96.9%

The final model has been obtained after sets of experments. Below is the brief description of every experiment setup, accuracy and transition to next experiment:

I started with LeNet model. The reason is that it classified numbers well and there was a chance that it would be able to classify german traffic signs as well which are 32x32 images as well. The only difference was that in LeNet the images have been grayscale but german traffic signs have 3 channels by default. So the model was updated to accept 32x32x3 images.
Such model gave not very good results. I lost exact acuracy numbers. But validation and test accuracy have been around 70%. I tried running LeNet on grayscale image (by converting RGB to grayscale first) which didn't produce any good results either.
Also it is worth noting that originally I had very simple normalization by simply substracting 128 and dividing by 255.
After reading this paper http://yann.lecun.com/exdb/publis/pdf/sermanet-ijcnn-11.pdf I realized that it might be worth trying enhancing number of filters in convultional layers. This model was exactly the same as previous one but first conv layer now had 100 filters and second one 200. Validation set accuracy was 97.4%. Test set accuracy was 84.5%. It is worth noting here that originally I downloaded just training and test datasets and was splitting training dataset using sklearn module to obtaine validation dataset. It is hard to explain why but here I observed much better performance on validation data than on test data. As a result I was not satisfied with such model.
Converted an image to YCrCb first and then applying same model as above produced validation accracy of 90.1% and test set accuracy of 69.2%.
Same model as above but used only Y channel of YCrCb. Validation accuracy: 95.8%
Test accuracy: 84.8%
Added third convolutional layer. Updatd number of filters to be 100, 150, 200 for first, second and third conv layers respectively like in the final model. Validation accuracy: 97.5%
Test accuracy: 87.7%
Added dropout layers like in the final model. Validation accuracy: 98.5%
Test accuracy: 90.9%
Improved image normalization to subtract mean and divide by stddev. Also started using the dataset provided by the project description. This is the final model. Validation accuracy: 97.9%
Test accuracy: 95.3%
Enhanced training dataset by performing image scaling, rotation and shifting. Validation accuracy: 99.0%
Test accuracy: 96.9%

Some thoughts on the final model:

Convolutional layer needs to have more filters which allows to capture different properties of an image. Since we have 43 classes we need to be able to capture diverse set of features. And having 6 and 16 filters like in LeNet is probably not enough to capture different features.
Dropout layer allows to train the network in more robust way by creating redundant connections which are activated under similar circumstances. That is acheived by randomly turning off some percentage of connections during training. Dropout layers should also reduce overfitting.
Ideally if I would be building production-ready model, I would experiment much more with model parameters and potentially find simplified model which still produces great results with the purpose of reducing training time and reducing predcition time.

Test a Model on New Images

1. Images discussion

Here are five German traffic signs that I found on the web:

Images are reshaped to 32x32x3 and preprocessed using same logic as trainign/validation/test images.

Such images should be simple to classify in general however they have interesting properties like watermarks of websites which distribute them, background objects.

2. Model performance on test images found in the web

The accuracy is 100%.

Image	Prediction
No passing	No passing
Road work	Road work
Children crossing	Children crossing
End of no passing	End of no passing
Wild animals crossing	Wild animals crossing

3. Looking out softmax probabilities of each test image

In general model outputs pretty high probabilities for correct class. Which is vry good. Here is the output of top 5 probabilities for each test image:

For image # 1 the top 5 answers are:
('No passing', 1.0)
('End of no passing', 6.2027712e-09)
('Speed limit (120km/h)', 9.8718078e-10)
('Slippery road', 4.6131909e-10)
('End of all speed and passing limits', 3.2826991e-10)
Correct answer is: No passing

For image # 2 the top 5 answers are:
('Road work', 1.0)
('Dangerous curve to the right', 7.0088174e-10)
('Slippery road', 2.6380976e-12)
('Beware of ice/snow', 5.7867475e-13)
('Double curve', 3.8999544e-13)
Correct answer is: Road work

For image # 3 the top 5 answers are:
('Children crossing', 0.99859339)
('Beware of ice/snow', 0.00048915728)
('Dangerous curve to the right', 0.00031584653)
('Slippery road', 0.00028684994)
('Traffic signals', 8.5214539e-05)
Correct answer is: Children crossing

For image # 4 the top 5 answers are:
('End of no passing', 0.65861064)
('End of all speed and passing limits', 0.30596185)
('Priority road', 0.020915516)
('End of no passing by vehicles over 3.5 metric tons', 0.0036699278)
('End of speed limit (80km/h)', 0.0032210965)
Correct answer is: End of no passing

For image # 5 the top 5 answers are:
('Wild animals crossing', 1.0)
('Double curve', 1.9002666e-08)
('Road work', 3.1246654e-09)
('Speed limit (50km/h)', 3.2554637e-10)
('No passing for vehicles over 3.5 metric tons', 1.7079377e-10)
Correct answer is: Wild animals crossing

All 5 images downloaded from internet have been predicted correctly. It is amazing to see that 4 of 5 predictions have softmax probability 1.0 and the fifth one 'end of no passing' has the probability of 65% and second probability is 30% for 'End of all speed and passing limits'.

Visualizing the Neural Network layers

Visualization of some trained neural netowrk layers have been performed based on activations of one test image to understand what kind of features does the network capture. Only layers before first pooling layer have been visualized as after first pooling layer the layer activations are not very representative as they encode relationship between activations of first convolutational layer (and subsequent additional layers: relu, dropout, pooling).

Visualization of the first convolutional layer (only 49 out of 100 filters are shown as plotting library can't visualize more):

As we can see the network seem to have learned some filters which transform input image and apply some kind of 'edge detection' filters but with different properties.
Visualization of the relu activation layer after first convolutional layer:

Activation relu layer removes a lot of pixel values (negative ones become 0) which seems to contribute to filtering which looks like real edge detection but with different gradient direction and magnitude.
Visualization of the max pool layer after first activation layer:

Pooling layer just downsamples activations from previous layer. Preserving detected edges and features but reducing number of pixels which basically reduces amount of imfromation for subsequent layers. Bascially it allows to focus on important 'features' removing redundant data.

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
TestSigns		TestSigns
writeup_imgs		writeup_imgs
.DS_Store		.DS_Store
.gitignore		.gitignore
CODEOWNERS		CODEOWNERS
LICENSE		LICENSE
README.md		README.md
Traffic_Sign_Classifier.html		Traffic_Sign_Classifier.html
Traffic_Sign_Classifier.ipynb		Traffic_Sign_Classifier.ipynb
set_git.sh		set_git.sh
signnames.csv		signnames.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Traffic Sign Recognition

Data Set Summary & Exploration

1. Basic summary of the dataset

2. Include an exploratory visualization of the dataset.

Design and Test a Model Architecture

1. Data preprocessing

Data enhancement

2. Model architecture

3. Training hyperparameters

4. Accuracy results and experiments description

Test a Model on New Images

1. Images discussion

2. Model performance on test images found in the web

3. Looking out softmax probabilities of each test image

Visualizing the Neural Network layers

About

Uh oh!

Releases

Packages

Contributors 15

Uh oh!

Languages

License

maks-bond/CarND-TrafficSignClassification

Folders and files

Latest commit

History

Repository files navigation

Traffic Sign Recognition

Data Set Summary & Exploration

1. Basic summary of the dataset

2. Include an exploratory visualization of the dataset.

Design and Test a Model Architecture

1. Data preprocessing

Data enhancement

2. Model architecture

3. Training hyperparameters

4. Accuracy results and experiments description

Test a Model on New Images

1. Images discussion

2. Model performance on test images found in the web

3. Looking out softmax probabilities of each test image

Visualizing the Neural Network layers

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 15

Uh oh!

Languages

Packages