Skip to content

Recurrent Neural Networks

Jhalak Patel edited this page Oct 17, 2017 · 16 revisions

Recurrent Neural Network Basics

  1. Why we need RNNs:
    • Variable length Sequence
    • Long-term dependency
    • Stateful representation
    • Memory

Types of RNNs:

Single Output at Single Input Step:

Single Fixed length vector to series of output:

Bi-Directional RNNs:

They are capable of capturing information from left to right and from right to left. It has two RNNs for capturing the information from either directions. eg. in speech recognition, to understand a phoneme (distinct unit of sound) at input "i" we need to gather information from "i+1" and "i-1" ( we need past as well as future steps)

Vector2Sequence Architecture (for Image Captioning):

Single fixed length vector spitting out series of output can be used for Image captioning. From a Image --> Use CNN or MLP to extract features of an Image --> Extract feature vector --> use RNN to generate proper caption i.e. word by word. Word_next = Captioning ( Current_word, Image Feature Vector)

Seq2Seq Architecture (for Neural Machine Translation):

Need to map a sequence to another sequence of different length. Encoder-Decoder model / Seq2Seq Model has 2 RNNs. Encoder process input sequence and does not emit output at each step. It captures input sequence, one word at a time and tries to capture task relevant information from sequence i.e. internal state. Final hidden state of the encoder is task relevant summary of input sequence - called a context or thought vector. context act as only input to the decoder - initial state of the decoder can be a function of context or context can be connected to all the hidden states of the decoder. Both the hyperparameters of encoder and decoder can be different.

RNN Unrolled Version :

Depth of a RNN - equal to number of time steps. With more hidden layers, we can stack RNNs to get deep RNNs for a input sequence. Between hidden to hidden connection with weight matrix - typically we have non-linear transformations i.e. CNN or MLP to learn higher level information. Deeper RNNs takes more to train.

Stacked RNNs:

Increasing number of layers -- not the time steps at each time step is called stacked RNNs. Allows network to capture higher level information in the sequence and maintain them in state. At each time step, we have number of layers states.

Variants of RNNs:

Based on the type of "CELL". Each "CELL" have unique gating mechanism i.e. how to control flow of the information from input to current state, from previous step to current state, and from current state to output. The shape of W, U and V will now have 3 layers of Weight Matrix (satisfying the input matrix dimensions for multiplication)

Attention Mechanism in RNNs:

in Seq2Seq framework entire information is encoded into fixed length vector i.e. context. With length of sequence getting larger, we are loosing information. Attention mechanism allows decoder of the se2seq arch to look at the input sequence while decoding. Thus encoder does not have to encoder every useful information from the input. Thus at each time step of the decoder, a distinct context vector "Ci" is generated for word "Yi". "Ci" is weighted sum of hidden states of the encoder. Contribution of each encoder hidden state is determined by alignment model parameter (which is also trained with the model). As each output sequence word is aligned to different parts of the input sequence. Thus alignment model will tell the measure of how well the output at position "i" match with the inputs at around position "j". Based on the alignment model we take weighted sum of input contexts (hidden states) to generate each word of the output sequence.

RNN rolled Version

LSTM and GRU Cell

RNN Blogs

  1. Unreasonable Effectiveness of RNN - Karpathy

  2. Getting Started with RNN: Great Tutorial from WildML

  3. Machine Translation

  4. Unfolding RNNs : 1

  5. Unfolding RNNs : 2

  6. LSTM Implementation