Update example to apply compressed-tensors to a model

Summary:
- Currently, this example exists which applies W4A16 to a Tiny Llama model: https://github.com/vllm-project/compressed-tensors/blob/main/examples/quantize_and_pack_int4.ipynb
- Update to use the latest tools in compressed-tensors. Specifically, the model should use `compress_model` when compressing the model, https://github.com/vllm-project/compressed-tensors/blob/main/examples/quantize_and_pack_int4.ipynb
- Remove the use of `compress_quantized_weights` which is out-dated

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update example to apply compressed-tensors to a model #2105

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Update example to apply compressed-tensors to a model #2105

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions