Skip to content

Conversation

@PineapplePulp
Copy link
Collaborator

Updated softmax to work with a user-specified axis and added matmul that broadcasts with .expand() in order to implement self-attention. Output projection layers, bias, and multi-head capability will be added in a future sprint, which is why the class is called MultiheadAttention.

Also added ndarray.variance and layer normalization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant