Skip to content

Optimize functionality using torch.compile #1485

@dsikka

Description

@dsikka

Functionality in llm-compressor and compressed-tensors could be further optimized using triton jits and torch.compile to provide execution speed-ups. Some functionality that could potentially be replaced include:

  1. Quantization Compressors compress_weight and decompress_weight functionality
  1. Observer calculate_qparams functionality for the MinMax observer and MSE observer
  2. Update GPTQ

A proposed solution should swap the existing code with optimized functionality, include updated tests, and quick benchmarks showing the differences in performance

Sub-issues

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestgood first issueA good first issue for users wanting to contributekeep-open

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions