[ValueError] Getting partially populated quantization config

I have followed following script to convert kimi-k2-thinking vanilla model to mxfp4 
```
LOCAL_QUARK_DIR=/data/users/kadvani/gitrepos/Quark
MODEL_INPUT_DIR=/data/users/kadvani/models/Kimi-K2-Thinking-BF16/
MODEL_OUTPUT_DIR=/data/users/kadvani/models/Kimi-K2-Thinking-MXFP4/

cd $LOCAL_QUARK_DIR/examples/torch/language_modeling/llm_ptq/
exclude_layers="*self_attn* *mlp.gate.* *lm_head"

# Speed up safetensors loading with parallel decompression
# export SAFETENSORS_FAST_GPU=1  # DISABLED: Causes HIP segfault during export
# export OMP_NUM_THREADS=16  # Adjust based on your CPU cores

export LD_PRELOAD=/usr/lib64/libzstd.so.1
# Use the detected ROCm version path to match amdsmi
export ROCM_HOME=/opt/rocm-7.0.2
export ROCM_PATH=/opt/rocm-7.0.2
export HIP_PATH=/opt/rocm-7.0.2
export LD_LIBRARY_PATH=$ROCM_PATH/lib:$LD_LIBRARY_PATH
export PYTORCH_ROCM_ARCH="gfx90a;gfx942"
export HSA_FORCE_FINE_GRAIN_PCIE=1

python3 quantize_quark.py --model_dir $MODEL_INPUT_DIR \
                          --quant_scheme w_mxfp4_a_mxfp4 \
                          --device cpu \
                          --group_size 32 \
                          --num_calib_data 128 \
                          --exclude_layers $exclude_layers \
                          --skip_evaluation \
                          --model_export hf_format \
                          --output_dir $MODEL_OUTPUT_DIR 2>&1 | tee /tmp/kadvani/quantize_quark.log
```
I was able to successfully get MXFP4 quantized model. 

Upon starting vllm server with the model i ran into error triggered because of this https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/quantization/quark/quark.py#L155. 

Upon checking, i have realized that config.json has non empty 
``` 
"kv_cache_group": [
        "*kv_b_proj"
      ],
``` 
but 
```
"kv_cache_quant_config": {},
"layer_quant_config": {},
"layer_type_quant_config": {},
```
Later i removed value from "kv_cache_group" to get rid of ValueError but model ran into another issue where model layers name got suffix model.layers.4.mlp.experts.w2_weight_packed instead of model.layers.4.mlp.experts.w2_weight. I added quick fix and somehow able to start the server but upon running eval i got 0 score. I am not sure if the changes made in order to start the server were intended or not. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ValueError] Getting partially populated quantization config #18

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ValueError] Getting partially populated quantization config #18

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions