Skip to content

Image Caption Generation Model that uses deep learning to automatically generate captions for images.

Notifications You must be signed in to change notification settings

yuziahaque/image-captioning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 

Repository files navigation

Image Caption Generation Model

Overview

This repository provides an Image Caption Generation Model that combines computer vision and natural language processing to generate captions for images. The model uses a pre-trained Inception-ResNet-V2 for image feature extraction and a GRU-based architecture with Attention to generate meaningful captions from the extracted features.

Key Features

  • Pre-trained Feature Extractor: Utilizes Inception-ResNet-V2 for extracting rich image features.
  • Attention Mechanism: A GRU-based decoder with an Attention mechanism to improve caption accuracy by focusing on relevant parts of the image.
  • Text Processing: The captions are preprocessed by tokenizing and adding special tokens to maintain structure.
  • Trained on COCO: The model is trained on the COCO Captions dataset, one of the most widely used datasets for image captioning tasks.

How It Works

  1. Image Feature Extraction: The model first extracts high-level features from images using Inception-ResNet-V2 (pre-trained on ImageNet). This step allows the model to understand the content of the image.

  2. Caption Generation: The model uses a GRU-based decoder, which generates a sequence of words based on the image features. The Attention mechanism helps the model focus on specific parts of the image to make the generated captions more relevant.

  3. Training Process: The model is trained using the COCO Captions dataset, which contains images paired with detailed captions. The training process adjusts the model to better understand the correlation between image features and textual descriptions.

  4. Generating Captions: Once trained, the model can generate captions for any input image. The model uses the <start> token to begin caption generation and stops when it reaches the <end> token.

Usage

  1. Download the COCO Captions dataset for training (if not already downloaded).
  2. Run the model.py script to train the model. It will save the trained model to a file named image_caption_model.h5.
  3. Use the predict_caption() function to generate captions for new images. Simply provide the path to the image you'd like to caption.

Future Improvements

  • Streamlit Integration: Currently, the model is implemented via a Python script. You can easily integrate a Streamlit UI to allow users to upload images and generate captions through a web interface.
  • Fine-tuning: The model can be fine-tuned on a different dataset to improve performance for specific image domains, such as medical images or product images.

Contributing

Feel free to fork this repository, make changes, and open pull requests! Contributions to improve the model or the documentation are always welcome.

About

Image Caption Generation Model that uses deep learning to automatically generate captions for images.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages