-
Notifications
You must be signed in to change notification settings - Fork 315
Closed
Labels
compressed-tensorsRelates to compressed-tensorsRelates to compressed-tensorsenhancementNew feature or requestNew feature or requestgood first issueA good first issue for users wanting to contributeA good first issue for users wanting to contributegood follow-up issueA good issue for users with some familiarity of the codebaseA good issue for users with some familiarity of the codebasewNa16Anything related to weight-only int-quantized supportAnything related to weight-only int-quantized support
Description
Summary:
- Currently, this example exists which applies W4A16 to a Tiny Llama model: https://github.com/vllm-project/compressed-tensors/blob/main/examples/quantize_and_pack_int4.ipynb
- Update to use the latest tools in compressed-tensors. Specifically, the model should use
compress_modelwhen compressing the model, https://github.com/vllm-project/compressed-tensors/blob/main/examples/quantize_and_pack_int4.ipynb - Remove the use of
compress_quantized_weightswhich is out-dated
Metadata
Metadata
Assignees
Labels
compressed-tensorsRelates to compressed-tensorsRelates to compressed-tensorsenhancementNew feature or requestNew feature or requestgood first issueA good first issue for users wanting to contributeA good first issue for users wanting to contributegood follow-up issueA good issue for users with some familiarity of the codebaseA good issue for users with some familiarity of the codebasewNa16Anything related to weight-only int-quantized supportAnything related to weight-only int-quantized support