Skip to content

deduplicate is_attention_module between compressed-tensors and llm-compressor #2079

@HDCharles

Description

@HDCharles

https://github.com/vllm-project/llm-compressor/blob/db0b68d9faf09066e9b7d679b39a977e484d9b91/src/llmcompressor/modifiers/utils/helpers.py#L32C4-L37

vs

https://github.com/vllm-project/compressed-tensors/blob/73c2cf935b53e0078be7766c5ee064755d980d78/src/compressed_tensors/quantization/lifecycle/initialize.py#L146

they do the exact same thing but if we want to expand this function (to support MLA attention or something like that) its a footgun to know to update 2 repositories. Should probably remove the llm-compressor one and just use the compressed-tensor one since that's already how it works in a few places.

Metadata

Metadata

Assignees

Labels

compressed-tensorsRelates to compressed-tensorsgood first issueA good first issue for users wanting to contribute

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions