Skip to content

DragosUnguru/VisualStorytelling

Repository files navigation

VisualStorytelling

My bachelor's thesis -- Visual Storytelling -- an encoder-decoder model consisting of both convolutional and attentional recurrent networks that transform a sequence of 5 images into a context-generative story in natural language (english).

Base model architecture

image

Results

From the above-illustrated architecture, two models were trained and compared:

  • Encoding the whole album's into one embedding vector (CS)
  • Encoding each photo separately to its own embedding vector and transmitting them to the decoder sequentially (CC)

image

image

image

image

How overfitting, even though it increases METEOR and ROGUE score, harm the results

image

About

My bachelor's thesis -- Visual Storytelling -- an encoder-decoder model consisting of both convolutional and attentional recurrent networks that transform a sequence of 5 images into a context-generative story in natural language (english).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors