Image Caption Generator using Deep Learning

Overview

This project implements an image captioning model using a CNN-LSTM architecture. The model takes an image as input and generates a descriptive caption using natural language processing techniques. It is trained on a dataset containing images and their corresponding textual descriptions.

Dataset

The model is trained on Flickr8k dataset.
It consists of 8000 images with multiple captions per image.

Data Augmentation

To improve model performance, images were horizontally flipped.

Model Architecture

The model consists of three main components:

Image Feature Extractor (CNN)
- Uses Xception to extract feature from images.
Sequence Processor (LSTM)
- An embedding layer processes input text sequences.
- An LSTM network learns dependencies between words in a sentence.
Decoder (Dense Layer with Softmax)
- Combines image features and text sequences.
- Generates the next word in the caption.

To view the model architecture in detail you may use Netron by uploading saved model.

Evaluation Metrics

The model is evaluated using the following metrics:
📌 BLEU-1: 0.6131
📌 BLEU-2: 0.5453
📌 BLEU-3: 0.4483
📌 BLEU-4: 0.3635
📌 ROUGE-L: 0.3314
📌 CIDEr: 0.0497
📌 SPICE: 0.0451

How to Use

1. Clone the Repository

git clone https://github.com/yourusername/image-captioning.git
cd image-captioning

2. Install Dependencies

pip install -r requirements.txt

3. Extract Features

mkdir data
python utils/preprocess.py
python utils/feature_extract.py
python utils/data_loader.py

4. Training

You can also use pretrained weigths.

python train.py

5. Run the Model

To test the model with your own images:

python test.py --image_path path/to/image.jpg

6. Streamlit Web App

Run the Streamlit interface for uploading images and generating captions:

streamlit run Streamlit.py

7. Evaluation of Model

Evaluate the model based on some NLP metrics commonly used for :

python evaluation/test_cap.py
python evaluation/evaluation.py

Results

Example output from the model:

Input Image
Generated Caption	"man in the water"

Future Improvements

🔹 Train on a larger dataset for improved generalization.
🔹 Experiment with Transformer-based models (e.g., ViT + GPT-2, BLIP).

Contributor

👤 Aditya Nikam student at IIT Kanpur - contact ([email protected] / [email protected])

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
evaluation		evaluation
utils		utils
weights/models_3		weights/models_3
.gitignore		.gitignore
OIP3.jpg		OIP3.jpg
README.md		README.md
Streamlit.py		Streamlit.py
TUTORIAL.md		TUTORIAL.md
model_3.png		model_3.png
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Image Caption Generator using Deep Learning

Overview

Dataset

Data Augmentation

Model Architecture

Evaluation Metrics

How to Use

1. Clone the Repository

2. Install Dependencies

3. Extract Features

4. Training

5. Run the Model

6. Streamlit Web App

7. Evaluation of Model

Results

Future Improvements

Contributor

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Adi2334/Image-Caption-Generator

Folders and files

Latest commit

History

Repository files navigation

Image Caption Generator using Deep Learning

Overview

Dataset

Data Augmentation

Model Architecture

Evaluation Metrics

How to Use

1. Clone the Repository

2. Install Dependencies

3. Extract Features

4. Training

5. Run the Model

6. Streamlit Web App

7. Evaluation of Model

Results

Future Improvements

Contributor

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages