Need Guideance for NMT models Quantization and prunning #12798
Unanswered
syedhamza671
asked this question in
Q&A
Replies: 1 comment
-
Hi, thanks for the question. We provide general quantization and pruning support for LLMs (some resources: https://docs.nvidia.com/nemo-framework/user-guide/latest/model-optimization/quantization/quantization.html, https://docs.nvidia.com/nemo-framework/user-guide/latest/model-optimization/pruning/pruning.html, https://github.com/NVIDIA/NeMo/tree/main/tutorials/llm/llama/pruning-distillation). However, the focus of this work has been on encoder-only or decoder-only models. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I couldn't find any blog , tutorial or documentation for NeMo NMT model's quantization and prunning. If anyone can help it would mean a lot.
Beta Was this translation helpful? Give feedback.
All reactions