diff --git a/scripts/inc_woq_g2_bkc.md b/scripts/inc_woq_g2_bkc.md index df11f4ab9a7..f13f38cfca0 100644 --- a/scripts/inc_woq_g2_bkc.md +++ b/scripts/inc_woq_g2_bkc.md @@ -33,13 +33,42 @@ This script 1) converts official model weights from `torch.float8_e4m3fn` format > [!NOTE] > For INC WoQ requantization, make sure to: > 1) Specify the path to the measurement files in the quantization configuration JSON file. -> +> > 2) Set the `QUANT_CONFIG` environment variable to point to this configuration file. -> +> >For more details, refer to the `INC WOQ ReQuant` section in the `single_16k_len_inc.sh` script. +### Configure the Measurement Statistics Results + +The environment variable `INC_MEASUREMENT_DUMP_PATH_PREFIX` specifies the root directory where measurement statistics were saved. +The final path is constructed by joining this root directory with the `dump_stats_path` defined in the quantization JSON file specified by the `QUANT_CONFIG` environment variable. + +#### Example + +If we download the measurements to `/path/to/vllm-fork/scripts/nc_workspace_measure_kvache`, we got below files: + +```bash +user:vllm-fork$ pwd +/path/to/vllm-fork +user:vllm-fork$ ls -l ./scripts/nc_workspace_measure_kvache +-rw-r--r-- 1 user Software-SG 1949230 May 15 08:05 inc_measure_output_hooks_maxabs_0_8.json +-rw-r--r-- 1 user Software-SG 254451 May 15 08:05 inc_measure_output_hooks_maxabs_0_8_mod_list.json +-rw-r--r-- 1 user Software-SG 1044888 May 15 08:05 inc_measure_output_hooks_maxabs_0_8.npz +... +``` + +Then, we export `INC_MEASUREMENT_DUMP_PATH_PREFIX=/path/to/vllm-fork`, and INC will parse the full as below: + +``` +dump_stats_path (from config): "scripts/nc_workspace_measure_kvache/inc_measure_output" +Resulting full path: "/path/to/vllm-fork/scripts/nc_workspace_measure_kvache/inc_measure_output_hooks_maxabs_0_8.npz" +``` + > [!CAUTION] -> Before running the benchmark, make sure to update the `model_path` in the `single_16k_len_inc.sh` script. +> Before running the benchmark, update the following variables in the single_16k_len_inc.sh script: +> - `model_path` +> - `QUANT_CONFIG` +> - `INC_MEASUREMENT_DUMP_PATH_PREFIX` ### 3.1 BF16 KV + Per-Channel Quantization @@ -57,7 +86,6 @@ cd vllm-fork bash ./scripts/single_16k_len_inc.sh ``` - ### 3.2 FP8 KV + Per-Channel Quantization - Get calibration files @@ -72,4 +100,4 @@ huggingface-cli download Yi30/inc-woq-default-pile-one-cache-412-g2 --local-dir ```bash cd vllm-fork bash scripts/single_16k_len.sh --fp8_kv -``` \ No newline at end of file +```