Skip to content

Commit c00a82d

Browse files
authored
[DeepSeek] Update Docs for Setting Measurement Results Path (#1251)
Depends on intel/neural-compressor#2210
2 parents 6767058 + 2687152 commit c00a82d

File tree

1 file changed

+33
-5
lines changed

1 file changed

+33
-5
lines changed

scripts/inc_woq_g2_bkc.md

Lines changed: 33 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -33,13 +33,42 @@ This script 1) converts official model weights from `torch.float8_e4m3fn` format
3333
> [!NOTE]
3434
> For INC WoQ requantization, make sure to:
3535
> 1) Specify the path to the measurement files in the quantization configuration JSON file.
36-
>
36+
>
3737
> 2) Set the `QUANT_CONFIG` environment variable to point to this configuration file.
38-
>
38+
>
3939
>For more details, refer to the `INC WOQ ReQuant` section in the `single_16k_len_inc.sh` script.
4040
41+
### Configure the Measurement Statistics Results
42+
43+
The environment variable `INC_MEASUREMENT_DUMP_PATH_PREFIX` specifies the root directory where measurement statistics were saved.
44+
The final path is constructed by joining this root directory with the `dump_stats_path` defined in the quantization JSON file specified by the `QUANT_CONFIG` environment variable.
45+
46+
#### Example
47+
48+
If we download the measurements to `/path/to/vllm-fork/scripts/nc_workspace_measure_kvache`, we got below files:
49+
50+
```bash
51+
user:vllm-fork$ pwd
52+
/path/to/vllm-fork
53+
user:vllm-fork$ ls -l ./scripts/nc_workspace_measure_kvache
54+
-rw-r--r-- 1 user Software-SG 1949230 May 15 08:05 inc_measure_output_hooks_maxabs_0_8.json
55+
-rw-r--r-- 1 user Software-SG 254451 May 15 08:05 inc_measure_output_hooks_maxabs_0_8_mod_list.json
56+
-rw-r--r-- 1 user Software-SG 1044888 May 15 08:05 inc_measure_output_hooks_maxabs_0_8.npz
57+
...
58+
```
59+
60+
Then, we export `INC_MEASUREMENT_DUMP_PATH_PREFIX=/path/to/vllm-fork`, and INC will parse the full as below:
61+
62+
```
63+
dump_stats_path (from config): "scripts/nc_workspace_measure_kvache/inc_measure_output"
64+
Resulting full path: "/path/to/vllm-fork/scripts/nc_workspace_measure_kvache/inc_measure_output_hooks_maxabs_0_8.npz"
65+
```
66+
4167
> [!CAUTION]
42-
> Before running the benchmark, make sure to update the `model_path` in the `single_16k_len_inc.sh` script.
68+
> Before running the benchmark, update the following variables in the single_16k_len_inc.sh script:
69+
> - `model_path`
70+
> - `QUANT_CONFIG`
71+
> - `INC_MEASUREMENT_DUMP_PATH_PREFIX`
4372
4473
### 3.1 BF16 KV + Per-Channel Quantization
4574

@@ -57,7 +86,6 @@ cd vllm-fork
5786
bash ./scripts/single_16k_len_inc.sh
5887
```
5988

60-
6189
### 3.2 FP8 KV + Per-Channel Quantization
6290

6391
- Get calibration files
@@ -72,4 +100,4 @@ huggingface-cli download Yi30/inc-woq-default-pile-one-cache-412-g2 --local-dir
72100
```bash
73101
cd vllm-fork
74102
bash scripts/single_16k_len.sh --fp8_kv
75-
```
103+
```

0 commit comments

Comments
 (0)