Automated image captioning using Keras and Flickr8k Dataset
-
Loading Data : Extracting captions and storing into dictionary format.
-
Preparing Sequences for training : Create sequences for captions adding start tag at beginning and end tag at last.
-
Processing captions : Create a unique list of words and convert each word into a fixed sized vector.Zero padding is done so that each sequence of captions is of equal size
-
Feature Extractor : Pass every image to Residual Network Model to get the corresponding 2048 length feature vector and save these encoded images in train_encoded_images.p.
-
Sequence Processor : This is a word embedding layer for handling the text input, followed by a Long Short-Term Memory (LSTM) recurrent neural network layer.
-
Fitting the Model : We fit the model on the training dataset and finally save it.
-
Prediction : Both the feature extractor and sequence processor output are merged together and processed by a dense layer to make a final prediction.
https://www.kaggle.com/gayathri9082/automated-image-captioning-flickr8/edit