🧠 AI Image Captioning using Flickr30k

This project implements an AI-based image captioning system using deep learning. It utilizes the Flickr30k dataset and a hybrid architecture combining a Convolutional Neural Network (CNN) and a Transformer or LSTM decoder to generate descriptive captions for images.

📌 Features

CNN (e.g., ResNet-50 or InceptionV3) as an image feature extractor
Transformer or LSTM decoder for sequence generation
Beam search decoding for improved caption quality
Evaluation using BLEU, METEOR, and CIDEr metrics
Modular and extensible codebase
Clean, readable output with caption visualization

🧰 Technologies Used

Python 3.x
PyTorch / TensorFlow (based on implementation)
TorchVision or Keras
NumPy, Pillow, Matplotlib
NLTK for text processing
Flickr30k Dataset (with captions)

🏁 How to Run

1. Clone this Repository

git clone https://github.com/your-username/image-captioning-flickr30k.git
cd image-captioning-flickr30k

2. Create a virtual environment (Optional)

python -m venv venv
source venv/bin/activate #For Windows: venv\Scripts\activate

3. Install Dependencies

pip install -r requirements.txt

4. Download and Organize Data

Download the Flickr30k dataset used for this project. Unzip the folder and place it in the Code directory of this project and you should be ready to go.

5. Perform Hyperparameter Training(Optional Step)

Run the file named ResNet-Transformer-Tune.ipynb for checking the model performance on different hyperparameters for a smaller portion of dataset. The code can be modified for changing the parameters observed or increasing the dataset(I utilized a smaller portion of the dataset due to limited GPU power). This code will save model with each combination of hyperparameters and may take up a lot of space. It also keeps track of the best hyperparameters.

6. Training the Model

Run the file named ResNet-Transformer.ipynb to train the model on the larger dataset (80% of the entire set). This code will save the trained model by the name of BestModel.

7. Generate the Captions

Utilize the model saved in last step to get the evaluation of model performance or to start generating the captions.

🧪 Output

Here is a sample output caption generated by the model for the image below:

📜 License

This project is open-source and available under the terms of the MIT License.

🌟 Authors

Shail Patel GitHub | LinkedIn

Vrushali Ranadive Github | LinkedIn

Yash Chaudhary GitHub | LinkedIn

Devanshu Shah Github | LinkedIn

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Code		Code
Output_Images		Output_Images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 AI Image Captioning using Flickr30k

📌 Features

🧰 Technologies Used

🏁 How to Run

1. Clone this Repository

2. Create a virtual environment (Optional)

3. Install Dependencies

4. Download and Organize Data

5. Perform Hyperparameter Training(Optional Step)

6. Training the Model

7. Generate the Captions

🧪 Output

📜 License

🌟 Authors

About

Uh oh!

Releases

Packages

Languages

License

devanshu-777/Flickr30k-Captioner

Folders and files

Latest commit

History

Repository files navigation

🧠 AI Image Captioning using Flickr30k

📌 Features

🧰 Technologies Used

🏁 How to Run

1. Clone this Repository

2. Create a virtual environment (Optional)

3. Install Dependencies

4. Download and Organize Data

5. Perform Hyperparameter Training(Optional Step)

6. Training the Model

7. Generate the Captions

🧪 Output

📜 License

🌟 Authors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages