[kernel] added half2 specialization for layernorm kernel#139
Open
dongxianzhe wants to merge 5 commits intovectorch-ai:mainfrom
Open
[kernel] added half2 specialization for layernorm kernel#139dongxianzhe wants to merge 5 commits intovectorch-ai:mainfrom
dongxianzhe wants to merge 5 commits intovectorch-ai:mainfrom
Conversation
guocuimi
reviewed
Apr 22, 2024
| } | ||
|
|
||
| template <> | ||
| void invoke_layernorm_kernel<half2>(half2* out, |
Collaborator
There was a problem hiding this comment.
sounds this template specializations are optional since they are covered by the general template. no?
guocuimi
reviewed
Apr 22, 2024
| const float epsilon, | ||
| int m, | ||
| int n) { | ||
| int half_n = n / 2; |
Collaborator
There was a problem hiding this comment.
sounds you didn't cover this in unittest.
guocuimi
reviewed
Apr 22, 2024
| float* dinput; | ||
| float* dweight; | ||
| float* dbias; | ||
| cudaMalloc((void**)&dout, sizeof(float) * m * n); |
Collaborator
There was a problem hiding this comment.
use torch::tensor to allocate memory
guocuimi
reviewed
Apr 22, 2024
| torch::nn::functional::LayerNormFuncOptions({n}).weight(weight).bias( | ||
| bias)); | ||
|
|
||
| half* hout = (half*)malloc(m * n * sizeof(half)); |
guocuimi
reviewed
Apr 22, 2024
| cudaMemcpy(dweight, hweight, sizeof(half) * n, cudaMemcpyHostToDevice); | ||
| cudaMemcpy(dbias, hbias, sizeof(half) * n, cudaMemcpyHostToDevice); | ||
|
|
||
| llm::kernel::invoke_layernorm_kernel<half>( |
Collaborator
There was a problem hiding this comment.
just test llm::kernel::layer_norm instead but pass in different length of input to trigger different kernel.
guocuimi
reviewed
Apr 22, 2024
Collaborator
guocuimi
left a comment
There was a problem hiding this comment.
thanks for adding the optimization. could you also add benchmark to show the improvements? thanks
e18e337 to
bc9f7e2
Compare
…nitest and just test llm::kernel::layer_norm
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
optimize layernorm kernel using half2 type
test layernorm kernel