Adding Mamba SSM Arm SVE Kernel#664
Open
hrushitfujitsu wants to merge 1 commit intohuggingface:mainfrom
Open
Conversation
Member
|
Thanks a lot for making a kernel and contributing to the kernels ecosystem 🤗! This kernel looks like awesome work. The scope of the
We encourage kernel developers to make kernels available through their own GitHub repositories and upload them to their own organization or user on the Hugging Face Hub. In that way, you also get all the credits for your work as a kernel developer. If this kernel should be integrated into the |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
With reference to the following PR huggingface/transformers#38185, and as per our knowledge since the kernels raised to this repo are automatically built and uploaded to the hub, we would like to raise this kernel to kernels-community
This kernel was successfully built and tested on G3E, this correction does not have any effect on the accuracy
Implementation
The new kernel vectorizes the selective scan computation using ARM SVE intrinsics. The implementation is intended to:
Performance Check
We also integrated this kernel to transformers repo(https://github.com/huggingface/transformers) locally, this kernel is 3-4x faster than the current implementation
Task 32 input tokens, 1 Generated token
The above table represents the overall generation time (in seconds), this benchmarking was also done on G3E(64 cores)
Co-authored by @hrushitfujitsu and @abhijain1204fujitsu