Skip to content

Latest commit

 

History

History
78 lines (50 loc) · 2.84 KB

File metadata and controls

78 lines (50 loc) · 2.84 KB

Flickr8k Image Captioning Project

This project aims to generate descriptive captions for images using the Flickr8k dataset. The approach combines deep learning techniques with natural language processing to achieve high-quality image captions.

Project Structure

  • data/: Contains the Flickr8k dataset images and captions.
  • notebooks/: Jupyter notebooks used for data preprocessing, model training, and evaluation.
  • models/: Trained models and checkpoints.
  • src/: Source code for data loading, preprocessing, and model definition.

Requirements

To run this project, you will need the following packages installed:

  • Python 3.7+
  • TensorFlow
  • Keras
  • Numpy
  • Pandas
  • Matplotlib
  • scikit-learn
  • NLTK
  • tqdm
  • OpenCV

You can install the required packages using the following command:

pip install -r requirements.txt

Dataset

The Flickr8k dataset is used for this project. It consists of 8,000 images and 40,000 captions. Each image has five different captions, providing diverse descriptions.

Data Preprocessing

The data preprocessing involves:

  1. Loading Images: Images are loaded and resized to a fixed size.
  2. Loading Captions: Captions are loaded and tokenized.
  3. Data Augmentation: Images are augmented using random flips, rotations, and contrast adjustments.
  4. Text Vectorization: Captions are vectorized using a custom standardization function.

Model Training

The model is trained using a combination of Convolutional Neural Networks (CNN) for image features and Recurrent Neural Networks (RNN) for text generation. Key steps include:

  1. Image Feature Extraction: Using a pre-trained CNN (e.g., InceptionV3) to extract features from images.
  2. Sequence Modeling: Using an RNN (e.g., LSTM) to generate captions based on the image features.
  3. Training: The model is trained with a custom loss function that combines categorical cross-entropy and BLEU scores.

Evaluation

The model's performance is evaluated using BLEU scores. BLEU (Bilingual Evaluation Understudy) is a metric for evaluating the quality of text generated by the model.

Usage

To run the notebook and train the model, execute the following command in the notebooks directory:

jupyter notebook flickr.ipynb

Ensure that the dataset is placed in the data/ directory, and the notebook has access to the required resources.

Results

The model achieves competitive BLEU scores on the validation and test sets, demonstrating its ability to generate coherent and relevant captions for images.

Acknowledgements

This project is based on various research papers and open-source projects in the field of image captioning and deep learning. Special thanks to the authors of these works for their valuable contributions.

License

This project is licensed under the MIT License. See the LICENSE file for more details.