This project is a project for the course Nueral Networks, at the Univesity of Siena.
The project is about image captioning, which is a task of generating a description of an image.
The project is structured as follows:
data/
: Dataset, preprocessing scripts, processed dataraw_data/
: contains the raw dataprocessed/
: contains the processed datadata_set/
: contains the data set class for loading the datapre_processing.py
: contains the code for preprocessing the data
inference/
: contains the code for prediction and evaluationcaption_predictor.py
: contains the code for predicting the caption , use the checkpoint models to predict the caption base on test setevaluation.py
: contains the code for evaluating the model, use the prediction results to evalute the model base on BLEU score
models/
: contains the code for the modelsbase_model.py
: contains the code for the base model - just a LSTM model and resnet50attention_model.py
: contains the code for the attention model - a LSTM model with attention mechanism
text/
: contains the code for the text processing and tokenizertokenizer.py
: contains the code for the tokenizer, and make the vocabulary, and the word to index mapping
utils/
: contains just the config filescripts/
:start.sh
: contains the code for making environment and install the dependenciesprepare_data.sh
: contains the code for downloading-preprocessing dataset
- In the main path, run
./scripts/start.sh
for making the environment and install the dependencies- It needs python 3.11 -> pytorch==2.5.1 has not support in python 3.12
- In the main path, run
run.py
for running the project