Small attention layer exercise

erodner · erodner · commit fc1c4976e9de · 2025-01-05T22:27:24.000+01:00
diff --git a/notebooks/09/exercise_attention_layer.ipynb b/notebooks/09/exercise_attention_layer.ipynb
@@ -0,0 +1,211 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<img style=\"float: right;\" src=\"../../assets/htwlogo.svg\">\n",
+    "\n",
+    "# Exercise: Studying Attention Layers\n",
+    "\n",
+    "**Author**: _Erik Rodner_ <br>\n",
+    "\n",
+    "In this exercise, we will analyze the scaled dot-product attention.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "import torch\n",
+    "import torch.nn.functional as F\n",
+    "from transformers import BertTokenizer"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Tokenization\n",
+    "\n",
+    "Let's first tokenize some text without any purpose really :)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')\n",
+    "\n",
+    "# Tokenization and input preparation\n",
+    "sentence = \"Transformers are powerful models for natural language processing.\"\n",
+    "tokens = tokenizer.tokenize(sentence)\n",
+    "input_ids = tokenizer.convert_tokens_to_ids(tokens)\n",
+    "input_tensor = torch.tensor([input_ids])\n",
+    "\n",
+    "print(f\"Sentence: '{sentence}'\")\n",
+    "print(f\"Tokens: {tokens}\")\n",
+    "print(f\"Input IDs: {input_ids}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Generate synthetic embedding data \n",
+    "\n",
+    "For simplicity, we'll use random values with a rather low dimension here. \n",
+    "In a real setting, the embeddings could be initially also random but also tuned during training."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "embedding_dim = 8\n",
+    "# the following construction also ignores the fact that initially embeddings should be the same for the same token\n",
+    "data = torch.rand((len(input_ids), embedding_dim))\n",
+    "print(f\"\\nGenerated Embedding Shape: {data.shape}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Transformer Layer in Action: Scaled Dot Product Attention\n",
+    "\n",
+    "Let's first generate queries, keys, and values.\n",
+    "Our $Q$, $K$, $V$ matrices are then computed by applying the embedding matrix to them."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dk = 4 # dimension of the query and key vectors\n",
+    "dv = 4 # dimension of the value vectors\n",
+    "query_weights = torch.rand((embedding_dim, dk))\n",
+    "key_weights = torch.rand((embedding_dim, dk))\n",
+    "value_weights = torch.rand((embedding_dim, dv))\n",
+    "\n",
+    "Q = torch.matmul(data, query_weights)\n",
+    "K = torch.matmul(data, key_weights)\n",
+    "V = torch.matmul(data, value_weights)\n",
+    "\n",
+    "print(f\"Query (Q) Shape: {Q.shape}\\n\", Q)\n",
+    "print(f\"Key (K) Shape: {K.shape}\\n\", K)\n",
+    "print(f\"Value (V) Shape: {V.shape}\\n\", V)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Scaled dot-product attention\n",
+    "\n",
+    "Let's apply scaled dot-product attention step-by-step.\n",
+    "\n",
+    "**Exercise 1**: complete the following function to compute the attention scores"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def compute_attention_scores(Q, K):\n",
+    "    dk = Q.size(-1)\n",
+    "    scores = 0 # YOUR CODE HERE: compute the dot product between Q and K properly :)\n",
+    "    attn_probs = F.softmax(scores, dim=-1)\n",
+    "    return attn_probs\n",
+    "\n",
+    "attention_scores = compute_attention_scores(Q, K)\n",
+    "print(f\"Attention Scores Shape: {attention_scores.shape}\\n\", attention_scores)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Exercise 2**: complete now the following function to compute the final embedding."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def compute_weighted_values(attention_scores, V):\n",
+    "    return 0 # YOUR CODE HERE: compute the weighted values properly :)\n",
+    "\n",
+    "weighted_values = compute_weighted_values(attention_scores, V)\n",
+    "print(f\"Weighted Values Shape: {weighted_values.shape}\\n\", weighted_values)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Visualization of the attention scores\n",
+    "\n",
+    "Let's visualize the attention scores in the following. Of course they are all random, but you get an idea of their shape."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Visualization of Attention Weights\n",
+    "fig, ax = plt.subplots(figsize=(10, 6))\n",
+    "cax = ax.matshow(attention_scores.detach().numpy(), cmap='viridis')\n",
+    "plt.title(\"Attention Scores Heatmap\")\n",
+    "plt.xticks(range(len(tokens)), tokens, rotation=90)\n",
+    "plt.yticks(range(len(tokens)), tokens)\n",
+    "fig.colorbar(cax)\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "ml-exercise-pip",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.20"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}