Skip to content

Commit 2687152

Browse files
committed
update docs
Signed-off-by: Yi Liu <[email protected]>
1 parent 9916e6f commit 2687152

File tree

1 file changed

+23
-12
lines changed

1 file changed

+23
-12
lines changed

scripts/inc_woq_g2_bkc.md

Lines changed: 23 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -33,30 +33,42 @@ This script 1) converts official model weights from `torch.float8_e4m3fn` format
3333
> [!NOTE]
3434
> For INC WoQ requantization, make sure to:
3535
> 1) Specify the path to the measurement files in the quantization configuration JSON file.
36-
>
36+
>
3737
> 2) Set the `QUANT_CONFIG` environment variable to point to this configuration file.
38-
>
38+
>
3939
>For more details, refer to the `INC WOQ ReQuant` section in the `single_16k_len_inc.sh` script.
4040
41-
42-
4341
### Configure the Measurement Statistics Results
4442

4543
The environment variable `INC_MEASUREMENT_DUMP_PATH_PREFIX` specifies the root directory where measurement statistics were saved.
4644
The final path is constructed by joining this root directory with the `dump_stats_path` defined in the quantization JSON file specified by the `QUANT_CONFIG` environment variable.
4745

48-
Example:
46+
#### Example
47+
48+
If we download the measurements to `/path/to/vllm-fork/scripts/nc_workspace_measure_kvache`, we got below files:
49+
4950
```bash
50-
INC_MEASUREMENT_DUMP_PATH_PREFIX=/mnt/disk3/vllm-fork
51+
user:vllm-fork$ pwd
52+
/path/to/vllm-fork
53+
user:vllm-fork$ ls -l ./scripts/nc_workspace_measure_kvache
54+
-rw-r--r-- 1 user Software-SG 1949230 May 15 08:05 inc_measure_output_hooks_maxabs_0_8.json
55+
-rw-r--r-- 1 user Software-SG 254451 May 15 08:05 inc_measure_output_hooks_maxabs_0_8_mod_list.json
56+
-rw-r--r-- 1 user Software-SG 1044888 May 15 08:05 inc_measure_output_hooks_maxabs_0_8.npz
57+
...
58+
```
59+
60+
Then, we export `INC_MEASUREMENT_DUMP_PATH_PREFIX=/path/to/vllm-fork`, and INC will parse the full as below:
61+
62+
```
5163
dump_stats_path (from config): "scripts/nc_workspace_measure_kvache/inc_measure_output"
52-
Resulting full path: "/mnt/disk3/vllm-fork/scripts/nc_workspace_measure_kvache/inc_measure_output_xx"
64+
Resulting full path: "/path/to/vllm-fork/scripts/nc_workspace_measure_kvache/inc_measure_output_hooks_maxabs_0_8.npz"
5365
```
5466

5567
> [!CAUTION]
5668
> Before running the benchmark, update the following variables in the single_16k_len_inc.sh script:
57-
> - `model_path`
58-
> - `QUANT_CONFIG`
59-
> - `INC_MEASUREMENT_DUMP_PATH_PREFIX`
69+
> - `model_path`
70+
> - `QUANT_CONFIG`
71+
> - `INC_MEASUREMENT_DUMP_PATH_PREFIX`
6072
6173
### 3.1 BF16 KV + Per-Channel Quantization
6274

@@ -74,7 +86,6 @@ cd vllm-fork
7486
bash ./scripts/single_16k_len_inc.sh
7587
```
7688

77-
7889
### 3.2 FP8 KV + Per-Channel Quantization
7990

8091
- Get calibration files
@@ -89,4 +100,4 @@ huggingface-cli download Yi30/inc-woq-default-pile-one-cache-412-g2 --local-dir
89100
```bash
90101
cd vllm-fork
91102
bash scripts/single_16k_len.sh --fp8_kv
92-
```
103+
```

0 commit comments

Comments
 (0)