This project implements an AI-based image captioning system using deep learning. It utilizes the Flickr30k dataset and a hybrid architecture combining a Convolutional Neural Network (CNN) and a Transformer or LSTM decoder to generate descriptive captions for images.
- CNN (e.g., ResNet-50 or InceptionV3) as an image feature extractor
- Transformer or LSTM decoder for sequence generation
- Beam search decoding for improved caption quality
- Evaluation using BLEU, METEOR, and CIDEr metrics
- Modular and extensible codebase
- Clean, readable output with caption visualization
- Python 3.x
- PyTorch / TensorFlow (based on implementation)
- TorchVision or Keras
- NumPy, Pillow, Matplotlib
- NLTK for text processing
- Flickr30k Dataset (with captions)
git clone https://github.com/your-username/image-captioning-flickr30k.git
cd image-captioning-flickr30kpython -m venv venv
source venv/bin/activate #For Windows: venv\Scripts\activatepip install -r requirements.txtDownload the Flickr30k dataset used for this project. Unzip the folder and place it in the Code directory of this project and you should be ready to go.
Run the file named ResNet-Transformer-Tune.ipynb for checking the model performance on different hyperparameters for a smaller portion of dataset. The code can be modified for changing the parameters observed or increasing the dataset(I utilized a smaller portion of the dataset due to limited GPU power). This code will save model with each combination of hyperparameters and may take up a lot of space. It also keeps track of the best hyperparameters.
Run the file named ResNet-Transformer.ipynb to train the model on the larger dataset (80% of the entire set). This code will save the trained model by the name of BestModel.
Utilize the model saved in last step to get the evaluation of model performance or to start generating the captions.
Here is a sample output caption generated by the model for the image below:

This project is open-source and available under the terms of the MIT License.
Vrushali Ranadive Github | LinkedIn