[BUG/Help] 用vllm 起INT4量化版本的模型报错 类型不匹配 self_attention.dense.weight int4 shape [4096,2048] mismatch fp16 shape [4096, 4096]

### Is there an existing issue for this?

- [X] I have searched the existing issues

### Current Behavior

self_attention.dense.weight int4 shape [4096,2048] mismatch fp16 shape [4096, 4096]
which results failling to setup vllm server


### Expected Behavior

chatglm2-6b-int4 can be deployed with vllm

### Steps To Reproduce

none

### Environment

```markdown
- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :
```


### Anything else?

s

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG/Help] 用vllm 起INT4量化版本的模型报错类型不匹配 self_attention.dense.weight int4 shape [4096,2048] mismatch fp16 shape [4096, 4096] #680

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG/Help] 用vllm 起INT4量化版本的模型报错 类型不匹配 self_attention.dense.weight int4 shape [4096,2048] mismatch fp16 shape [4096, 4096] #680

Description

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[BUG/Help] 用vllm 起INT4量化版本的模型报错类型不匹配 self_attention.dense.weight int4 shape [4096,2048] mismatch fp16 shape [4096, 4096] #680