deduplicate is_attention_module between compressed-tensors and llm-compressor

https://github.com/vllm-project/llm-compressor/blob/db0b68d9faf09066e9b7d679b39a977e484d9b91/src/llmcompressor/modifiers/utils/helpers.py#L32C4-L37

vs 

https://github.com/vllm-project/compressed-tensors/blob/73c2cf935b53e0078be7766c5ee064755d980d78/src/compressed_tensors/quantization/lifecycle/initialize.py#L146

they do the exact same thing but if we want to expand this function (to support MLA attention or something like that) its a footgun to know to update 2 repositories. Should probably remove the llm-compressor one and just use the compressed-tensor one since that's already how it works in a few places.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

deduplicate is_attention_module between compressed-tensors and llm-compressor #2079

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

deduplicate is_attention_module between compressed-tensors and llm-compressor #2079

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions