Hi team,
I’m trying to quantize the Qwen2.5-VL (3B) model using lmdeploy lite auto_awq, but I’m running into the following issue:
Problem Description:
My model’s config.json has "architectures": ["Qwen2_5_VLForConditionalGeneration"] and "model_type": "qwen2_5_vl".
However, when I run the quantization command, I get this error:
RuntimeError: Currently, quantification and calibration of Qwen2_5_VLTextModel are not supported. The supported model types are ... Qwen2_5_VLForConditionalGeneration ...
Here is a snippet from my config.json:
"architectures": [
"Qwen2_5_VLForConditionalGeneration"
],
"model_type": "qwen2_5_vl"
My quantization command:
lmdeploy lite auto_awq \
./model_weight/Recognition \
--calib-dataset 'ptb' \
--calib-samples 64 \
--calib-seqlen 1024 \
--w-bits 4 \
--w-group-size 128 \
--batch-size 1 \
--work-dir ./monkeyocr_quantization
Environment:
- transformers version: 4.52.4
- lmdeploy version: 0.9.0 (also tried latest from source)
- Qwen3 currently does not provide a pip package, and transformers cannot directly import qwen2_5_vl models
My Questions:
- The README of lmdeploy states that Qwen2.5-VL (3B, 7B, 72B) is supported for INT4 quantization and inference, but quantization fails with this error and the model is not recognized as ForConditionalGeneration.
- How can I quantize official Qwen2.5-VL weights with lmdeploy AWQ? Is there any temporary model code or script available for this?
- Or is it currently only possible to use already-quantized weights for inference, and self-quantization is not yet supported?
Thank you for your help!