You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: scripts/inc_woq_g2_bkc.md
+23-12Lines changed: 23 additions & 12 deletions
Original file line number
Diff line number
Diff line change
@@ -33,30 +33,42 @@ This script 1) converts official model weights from `torch.float8_e4m3fn` format
33
33
> [!NOTE]
34
34
> For INC WoQ requantization, make sure to:
35
35
> 1) Specify the path to the measurement files in the quantization configuration JSON file.
36
-
>
36
+
>
37
37
> 2) Set the `QUANT_CONFIG` environment variable to point to this configuration file.
38
-
>
38
+
>
39
39
>For more details, refer to the `INC WOQ ReQuant` section in the `single_16k_len_inc.sh` script.
40
40
41
-
42
-
43
41
### Configure the Measurement Statistics Results
44
42
45
43
The environment variable `INC_MEASUREMENT_DUMP_PATH_PREFIX` specifies the root directory where measurement statistics were saved.
46
44
The final path is constructed by joining this root directory with the `dump_stats_path` defined in the quantization JSON file specified by the `QUANT_CONFIG` environment variable.
47
45
48
-
Example:
46
+
#### Example
47
+
48
+
If we download the measurements to `/path/to/vllm-fork/scripts/nc_workspace_measure_kvache`, we got below files:
0 commit comments