Flickr8k Image Captioning Project

This project aims to generate descriptive captions for images using the Flickr8k dataset. The approach combines deep learning techniques with natural language processing to achieve high-quality image captions.

Project Structure

data/: Contains the Flickr8k dataset images and captions.
notebooks/: Jupyter notebooks used for data preprocessing, model training, and evaluation.
models/: Trained models and checkpoints.
src/: Source code for data loading, preprocessing, and model definition.

Requirements

To run this project, you will need the following packages installed:

Python 3.7+
TensorFlow
Keras
Numpy
Pandas
Matplotlib
scikit-learn
NLTK
tqdm
OpenCV

You can install the required packages using the following command:

pip install -r requirements.txt

Dataset

The Flickr8k dataset is used for this project. It consists of 8,000 images and 40,000 captions. Each image has five different captions, providing diverse descriptions.

Data Preprocessing

The data preprocessing involves:

Loading Images: Images are loaded and resized to a fixed size.
Loading Captions: Captions are loaded and tokenized.
Data Augmentation: Images are augmented using random flips, rotations, and contrast adjustments.
Text Vectorization: Captions are vectorized using a custom standardization function.

Model Training

The model is trained using a combination of Convolutional Neural Networks (CNN) for image features and Recurrent Neural Networks (RNN) for text generation. Key steps include:

Image Feature Extraction: Using a pre-trained CNN (e.g., InceptionV3) to extract features from images.
Sequence Modeling: Using an RNN (e.g., LSTM) to generate captions based on the image features.
Training: The model is trained with a custom loss function that combines categorical cross-entropy and BLEU scores.

Evaluation

The model's performance is evaluated using BLEU scores. BLEU (Bilingual Evaluation Understudy) is a metric for evaluating the quality of text generated by the model.

Usage

To run the notebook and train the model, execute the following command in the notebooks directory:

jupyter notebook flickr.ipynb

Ensure that the dataset is placed in the data/ directory, and the notebook has access to the required resources.

Results

The model achieves competitive BLEU scores on the validation and test sets, demonstrating its ability to generate coherent and relevant captions for images.

Acknowledgements

This project is based on various research papers and open-source projects in the field of image captioning and deep learning. Special thanks to the authors of these works for their valuable contributions.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flickr8k Image Captioning Project

Project Structure

Requirements

Dataset

Data Preprocessing

Model Training

Evaluation

Usage

Results

Acknowledgements

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Flickr8k Image Captioning Project

Project Structure

Requirements

Dataset

Data Preprocessing

Model Training

Evaluation

Usage

Results

Acknowledgements

License