Skip to content

[DeepSeek] Update Docs for Setting Measurement Results Path #1251

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 16, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 33 additions & 5 deletions scripts/inc_woq_g2_bkc.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,42 @@ This script 1) converts official model weights from `torch.float8_e4m3fn` format
> [!NOTE]
> For INC WoQ requantization, make sure to:
> 1) Specify the path to the measurement files in the quantization configuration JSON file.
>
>
> 2) Set the `QUANT_CONFIG` environment variable to point to this configuration file.
>
>
>For more details, refer to the `INC WOQ ReQuant` section in the `single_16k_len_inc.sh` script.

### Configure the Measurement Statistics Results

The environment variable `INC_MEASUREMENT_DUMP_PATH_PREFIX` specifies the root directory where measurement statistics were saved.
The final path is constructed by joining this root directory with the `dump_stats_path` defined in the quantization JSON file specified by the `QUANT_CONFIG` environment variable.

#### Example

If we download the measurements to `/path/to/vllm-fork/scripts/nc_workspace_measure_kvache`, we got below files:

```bash
user:vllm-fork$ pwd
/path/to/vllm-fork
user:vllm-fork$ ls -l ./scripts/nc_workspace_measure_kvache
-rw-r--r-- 1 user Software-SG 1949230 May 15 08:05 inc_measure_output_hooks_maxabs_0_8.json
-rw-r--r-- 1 user Software-SG 254451 May 15 08:05 inc_measure_output_hooks_maxabs_0_8_mod_list.json
-rw-r--r-- 1 user Software-SG 1044888 May 15 08:05 inc_measure_output_hooks_maxabs_0_8.npz
...
```

Then, we export `INC_MEASUREMENT_DUMP_PATH_PREFIX=/path/to/vllm-fork`, and INC will parse the full as below:

```
dump_stats_path (from config): "scripts/nc_workspace_measure_kvache/inc_measure_output"
Resulting full path: "/path/to/vllm-fork/scripts/nc_workspace_measure_kvache/inc_measure_output_hooks_maxabs_0_8.npz"
```

> [!CAUTION]
> Before running the benchmark, make sure to update the `model_path` in the `single_16k_len_inc.sh` script.
> Before running the benchmark, update the following variables in the single_16k_len_inc.sh script:
> - `model_path`
> - `QUANT_CONFIG`
> - `INC_MEASUREMENT_DUMP_PATH_PREFIX`

### 3.1 BF16 KV + Per-Channel Quantization

Expand All @@ -57,7 +86,6 @@ cd vllm-fork
bash ./scripts/single_16k_len_inc.sh
```


### 3.2 FP8 KV + Per-Channel Quantization

- Get calibration files
Expand All @@ -72,4 +100,4 @@ huggingface-cli download Yi30/inc-woq-default-pile-one-cache-412-g2 --local-dir
```bash
cd vllm-fork
bash scripts/single_16k_len.sh --fp8_kv
```
```
Loading