[Feature]: Add AWQ quantization support for vllm-ascend

### 🚀 The feature, motivation and pitch

# motivation
AWQ quantization is is a commonly used quantitative method, and there are many quantized models that can be used immediately, such as `Qwen`. Now `vllm-ascend` support special quantized model which is quantized by `modelslim`, but it will take a lot of time to quantize model and we cannot cover all the models if user want to run a quantized model.

# implement

# validation
| Type | Architecture | Models | Model Name | Aclgraph Mode | Accuracy | Performance | Compare to W8A8 |
|------|--------------|--------|--------------|----------------|-----------|--------------|-------------------|
|Text-only|DeepseekV3ForCausalLM|DeepSeek-V3||||||
|Text-only|DeepseekV3ForCausalLM|DeepSeek-R1||||||
|Text-only|Qwen2ForCausalLM|QwQ, Qwen2|Qwen/Qwen2.5-32B-Instruct-AWQ Qwen/QwQ-32B-AWQ|✅||||
|Text-only|Qwen3ForCausalLM| Qwen3 |Qwen/Qwen3-32B-AWQ|✅|ceval:0.85|||
|Text-only| Qwen3MoeForCausalLM | Qwen3MoE |billy800/Qwen3-30B-A3B-Instruct-2507-AWQ|✅|ceval:0.8403|||
| Multimodal | Qwen2AudioForConditionalGeneration | Qwen2-Audio ||||||
| Multimodal | Qwen2VLForConditionalGeneration | QVQ, Qwen2-VL |Qwen/Qwen2-VL-7B-Instruct-AWQ|✅|||||
| Multimodal | Qwen2_5_VLForConditionalGeneration | Qwen2.5-VL |Qwen/Qwen2.5-VL-32B-Instruct-AWQ|❌(accuracy issue)||||
| Multimodal | Qwen3VLForConditionalGeneration | Qwen3-VL |tclf90/Qwen3-VL-32B-Instruct-AWQ|✅||||
| Multimodal | Qwen3VLMoeForConditionalGeneration | Qwen3-VL-MOE |tclf90/Qwen3-VL-30B-A3B-Instruct-AWQ|✅||||


### Alternatives

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature]: Add AWQ quantization support for vllm-ascend #4378

🚀 The feature, motivation and pitch

motivation

implement

validation

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Type	Architecture	Models	Model Name	Aclgraph Mode	Accuracy
Text-only	DeepseekV3ForCausalLM	DeepSeek-V3
Text-only	DeepseekV3ForCausalLM	DeepSeek-R1
Text-only	Qwen2ForCausalLM	QwQ, Qwen2	Qwen/Qwen2.5-32B-Instruct-AWQ Qwen/QwQ-32B-AWQ	✅
Text-only	Qwen3ForCausalLM	Qwen3	Qwen/Qwen3-32B-AWQ	✅	ceval:0.85
Text-only	Qwen3MoeForCausalLM	Qwen3MoE	billy800/Qwen3-30B-A3B-Instruct-2507-AWQ	✅	ceval:0.8403
Multimodal	Qwen2AudioForConditionalGeneration	Qwen2-Audio
Multimodal	Qwen2VLForConditionalGeneration	QVQ, Qwen2-VL	Qwen/Qwen2-VL-7B-Instruct-AWQ	✅
Multimodal	Qwen2_5_VLForConditionalGeneration	Qwen2.5-VL	Qwen/Qwen2.5-VL-32B-Instruct-AWQ	❌(accuracy issue)
Multimodal	Qwen3VLForConditionalGeneration	Qwen3-VL	tclf90/Qwen3-VL-32B-Instruct-AWQ	✅
Multimodal	Qwen3VLMoeForConditionalGeneration	Qwen3-VL-MOE	tclf90/Qwen3-VL-30B-A3B-Instruct-AWQ	✅

[Feature]: Add AWQ quantization support for vllm-ascend #4378

Description

🚀 The feature, motivation and pitch

motivation

implement

validation

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions