Need Guideance for NMT models Quantization and prunning #12798

Unanswered

syedhamza671 asked this question in Q&A

syedhamza671
Mar 27, 2025

I couldn't find any blog , tutorial or documentation for NeMo NMT model's quantization and prunning. If anyone can help it would mean a lot.

Replies: 1 comment

ashors1
May 5, 2025
Collaborator

Hi, thanks for the question. We provide general quantization and pruning support for LLMs (some resources: https://docs.nvidia.com/nemo-framework/user-guide/latest/model-optimization/quantization/quantization.html, https://docs.nvidia.com/nemo-framework/user-guide/latest/model-optimization/pruning/pruning.html, https://github.com/NVIDIA/NeMo/tree/main/tutorials/llm/llama/pruning-distillation). However, the focus of this work has been on encoder-only or decoder-only models.

0 replies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment