Using Flickr8k dataset 1GB. for each photo 5 descriptions are available.
Used Keras with Tensorflow backend for the code. VGG is used for extracting the features.
No Beam search is yet implemented.
You can download the weights here
- Keras 1.2.2
- Tensorflow 0.12.1
- numpy
- matplotlib
[1] Vinyals, Oriol, et al. "Show and tell: A neural image caption generator." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. Show and Tell: A Neural Image Caption Generator
[2] Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014). VGG