Replies: 1 comment 5 replies
-
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment

Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
https://github.com/AlpinDale/sparsegpt-for-LLaMA
https://arxiv.org/abs/2301.00774
"We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models."
Looks like someone is has implemented SparseGPT for the Llama model. If I understand correctly that means we can cut in half the size of the llama models without significant loss of precision.
I want to know what you think about it and if you're planning on testing it and see if you can get the same result with half the VRAM, I would finally be able to run the 13b on my shitty 8gb 2080 super 🤣
PS: In less than a month, 65B Llama will work on the super nintendo 👀
Beta Was this translation helpful? Give feedback.
All reactions