Basic acknowledge of natural language processing model: Transformer.
- Attention Mechanism: Reasons why attention.
- Kinds of attention: soft & hard attention mechanism.
- Difference between attention and self-attention.
- How self-attention worked.
- About multi-head self-attention.
- Some classifier function: Sigmoid, Softmax, Tanh and ReLu.
- Little about CNN and RNN.
- Vanishing gradient problem & explosion gradient problem.
- How transformer solved long-distance dependencies problem.
- Descrption of transformer models: Transformer Encoder & Transformer Decoder.
- Difference between Transformer Encoder & Transformer Decoder.
👉 Note
- About logistic regression & linear regression.
- Some supervised learning methods: KNN, SVM, Kernel-SVM, Decision Tree and Naive Bayes.
- Ensemble Learning: Bagging & Boosting.
- Comparison of Bagging & Boosting.
- Difference between logistic regression & linear regression.
👉Note
- Frame of Optimization function.
- SGD.
- SGD-Momentum.
- SGD with Nesterov Accerleratin(NAG).
- AdamGrad.
- AdamDelta.
- Adam.
- Nadam.
- 2 shortbacks of Adam.
- Adam + SGD.
👉Note
- Why RNN.
- How RNN works.
- About LSTM.
- How LSTM solved long-distance dependencies problem.
- Gate Control of LSTM:
- forget gate
- input gate
- output gate
- sigmoid & tanh
- Simpler LSTM: GRU
- reset gate
- update gate
👉 Note
- Aim of back propagation.
- How Back propagation works.
- Difference between back and forward.
- Good point of BP.
👉Note
- How to transform CFG to CNF.
👉 Note