Optimize functionality using  torch.compile

Functionality in llm-compressor and compressed-tensors could be further optimized using triton jits and torch.compile to provide execution speed-ups. Some functionality that could potentially be replaced include:

1. Quantization Compressors `compress_weight` and `decompress_weight` functionality 
- [packed-compressor](https://github.com/neuralmagic/compressed-tensors/blob/83679851bc5edce766da1ea6946990748461d3ac/src/compressed_tensors/compressors/quantized_compressors/pack_quantized.py#L89) 
- [nvfp-compressor](https://github.com/neuralmagic/compressed-tensors/blob/83679851bc5edce766da1ea6946990748461d3ac/src/compressed_tensors/compressors/quantized_compressors/nvfp4_quantized.py#L64)
2. Observer `calculate_qparams` functionality for the [MinMax observer](https://github.com/vllm-project/llm-compressor/blob/94a3e5356f1c5bf3d836d6522c2a68391761cc63/src/llmcompressor/observers/min_max.py#L33) and [MSE observer](https://github.com/vllm-project/llm-compressor/blob/94a3e5356f1c5bf3d836d6522c2a68391761cc63/src/llmcompressor/observers/mse.py#L111) 
3. Update GPTQ 

A proposed solution should swap the existing code with optimized functionality, include updated tests, and quick benchmarks showing the differences in performance 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize functionality using torch.compile #1485

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimize functionality using torch.compile #1485

Description

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions