File tree 1 file changed +4
-4
lines changed
1 file changed +4
-4
lines changed Original file line number Diff line number Diff line change @@ -37,8 +37,8 @@ You may use `softmax` provided by mlx and implement it later in week 2.
37
37
** 📚 Readings**
38
38
39
39
* [ Annotated Transformer] ( https://nlp.seas.harvard.edu/annotated-transformer/ )
40
- * [ PyTorch API] ( https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html ) (assume ` enable_gqa=False ` , assume dim_k=dim_v=dim_q and H_k=H_v=H_q)
41
- * [ MLX API] ( https://ml-explore.github.io/mlx/build/html/python/_autosummary/mlx.core.fast.scaled_dot_product_attention.html ) (assume dim_k=dim_v=dim_q and H_k=H_v=H_q)
40
+ * [ PyTorch Scaled Dot Product Attention API] ( https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html ) (assume ` enable_gqa=False ` , assume dim_k=dim_v=dim_q and H_k=H_v=H_q)
41
+ * [ MLX Scaled Dot Product Attention API] ( https://ml-explore.github.io/mlx/build/html/python/_autosummary/mlx.core.fast.scaled_dot_product_attention.html ) (assume dim_k=dim_v=dim_q and H_k=H_v=H_q)
42
42
* [ Attention is All You Need] ( https://arxiv.org/abs/1706.03762 )
43
43
44
44
## Task 2: Implement ` MultiHeadAttention `
@@ -77,8 +77,8 @@ transpose it to get the right shape.
77
77
** 📚 Readings**
78
78
79
79
* [ Annotated Transformer] ( https://nlp.seas.harvard.edu/annotated-transformer/ )
80
- * [ PyTorch API] ( https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html ) (assume dim_k=dim_v=dim_q and H_k=H_v=H_q)
81
- * [ MLX API] ( https://ml-explore.github.io/mlx/build/html/python/nn/_autosummary/mlx.nn.MultiHeadAttention.html ) (assume dim_k=dim_v=dim_q and H_k=H_v=H_q)
80
+ * [ PyTorch MultiHeadAttention API] ( https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html ) (assume dim_k=dim_v=dim_q and H_k=H_v=H_q)
81
+ * [ MLX MultiHeadAttention API] ( https://ml-explore.github.io/mlx/build/html/python/nn/_autosummary/mlx.nn.MultiHeadAttention.html ) (assume dim_k=dim_v=dim_q and H_k=H_v=H_q)
82
82
83
83
At the end of the day, you should be able to pass the following tests:
84
84
You can’t perform that action at this time.
0 commit comments