Skip to content

[ValueError] Getting partially populated quantization config #18

@smitkadvani

Description

@smitkadvani

I have followed following script to convert kimi-k2-thinking vanilla model to mxfp4

LOCAL_QUARK_DIR=/data/users/kadvani/gitrepos/Quark
MODEL_INPUT_DIR=/data/users/kadvani/models/Kimi-K2-Thinking-BF16/
MODEL_OUTPUT_DIR=/data/users/kadvani/models/Kimi-K2-Thinking-MXFP4/

cd $LOCAL_QUARK_DIR/examples/torch/language_modeling/llm_ptq/
exclude_layers="*self_attn* *mlp.gate.* *lm_head"

# Speed up safetensors loading with parallel decompression
# export SAFETENSORS_FAST_GPU=1  # DISABLED: Causes HIP segfault during export
# export OMP_NUM_THREADS=16  # Adjust based on your CPU cores

export LD_PRELOAD=/usr/lib64/libzstd.so.1
# Use the detected ROCm version path to match amdsmi
export ROCM_HOME=/opt/rocm-7.0.2
export ROCM_PATH=/opt/rocm-7.0.2
export HIP_PATH=/opt/rocm-7.0.2
export LD_LIBRARY_PATH=$ROCM_PATH/lib:$LD_LIBRARY_PATH
export PYTORCH_ROCM_ARCH="gfx90a;gfx942"
export HSA_FORCE_FINE_GRAIN_PCIE=1

python3 quantize_quark.py --model_dir $MODEL_INPUT_DIR \
                          --quant_scheme w_mxfp4_a_mxfp4 \
                          --device cpu \
                          --group_size 32 \
                          --num_calib_data 128 \
                          --exclude_layers $exclude_layers \
                          --skip_evaluation \
                          --model_export hf_format \
                          --output_dir $MODEL_OUTPUT_DIR 2>&1 | tee /tmp/kadvani/quantize_quark.log

I was able to successfully get MXFP4 quantized model.

Upon starting vllm server with the model i ran into error triggered because of this https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/quantization/quark/quark.py#L155.

Upon checking, i have realized that config.json has non empty

"kv_cache_group": [
        "*kv_b_proj"
      ],

but

"kv_cache_quant_config": {},
"layer_quant_config": {},
"layer_type_quant_config": {},

Later i removed value from "kv_cache_group" to get rid of ValueError but model ran into another issue where model layers name got suffix model.layers.4.mlp.experts.w2_weight_packed instead of model.layers.4.mlp.experts.w2_weight. I added quick fix and somehow able to start the server but upon running eval i got 0 score. I am not sure if the changes made in order to start the server were intended or not.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions