Goals:
- Implement a Fully Convolutional Network (FCN).
- Use it to perform semantic segmentation on images, detecting which pixels belong to the road.
Make sure you have the following is installed:
Download the Kitti Road dataset from here. Extract the dataset in the data folder. This will create the folder data_road with all the training a test images.
Rubric Points
It does, refer to main.py lines 20-42. In particular, we do:
tf.saved_model.loader.load(sess, [vgg_tag], vgg_path)
graph = tf.get_default_graph()And from the graph we obtain the pretrained tensors by name.
It does, refer to main.py lines 46-63. It's leveraging the VGG model and adding the upsampling layers, following the model described in the Fully Convolutional Networks for Semantic Segmentation whitepaper.
It does, refer to main.py lines 67-81. We compute the softmax cross entropy between logits and labels and use an Adam algorithm optimizer to minimize the cross entrpy loss.
Yes. Refer to main.py lines 85-109. It runs as per the specified number of epochs, using the batch_size parameter to obtain batched sets of training data. The loss of the network is printed while the network is training after each batch is processed.
On average, the model decreases loss over time. The gains are big early on in the training but seem to stabilize by epoch 200.
I found that the cross entropy loss doesn't decrease much after 200 or so epochs, as it gets to a loss value between 0.1 and 0.05. As seen in main.py lines 119-120 I left the code running 400 epochs and the batch size to be 12. This was a good batch size given the memory constraints in my GPU.
The network does appear to correctly identify the road on all pictures with very little bleeding into non-road areas as seen in the following examples.
Example 1:
Example 2:
Example 3:


