GPTQ updates #2235

HDCharles · 2025-05-21T22:25:58Z

Summary:

reorganized GPTQ
a) got rid of old GPTQ and renamed GPTQ_MT to GPTQ
b) moved new GPTQ to prototype
c) moved quantized linear modules in GPTQ.py to linear_quant_modules.py
removed dependence on lm_eval for input_recorder
a) created new input recorder that doesn't depend on lm_eval
b) made lm_eval input recorder depend on new generic input_recorder
c) made TransformerEvalWrapper the base class and made LMEvalInputRecorder inherit from it (rather than vice versa like before)
d) updated apis generally to work with new input recorder (inputs have to be passed in as though they were being passed into the model, so input_recorder(*args) rather than input_recorder(args)
reorganized GPTQ tests
a) moved tests from test_quant_api.py to test_gptq.py
b) added new test that can be run in CI that doesn't depend on
c) removed all the 8aw4 tests that we never got working (is this fine?)
lm_eval/llama weights
c) got rid of test_gptq_mt.py (consolidated into test_gptq where relevant.
added new documentation for lm_eval
a) new readme and eval benchmarks for GPTQ
b) comments in GPTQ.py
GPTQ improvements
a) tested compilation of hessian calculation and parts of faster quant,
generally they were slower or buggy. Possible to get a speedup but it was inconsistent so removed it.
b) reimplemented faster quant while trying to compile it(improved speed by 2-5% and code is clearer while trying to compile pasrts)
c) moved helper functions out of the class. They're largely generic and
this is less cluttered. May need to revisit how generic they are if new GPTQQuantizers are made.
d) some improvements to the duplication checking and copying to be
faster when possible (previously MultiTensor.unpad had checks but at the point that its called, we already did equality checks so we don't need them in unpad.
e) fixed some bugs due to this not being in CI and things changing for
int4wo tensor subclass.
BC
a) got rid of Int8DynActInt4WeightGPTQQuantizer since it was unused. Can re-add if desired.
b) for other imports, maintained BC, previous impoirts from quantization/GPTQ.py now go to quantization/GPTQ/init.py
c) InputRecorder -> LMEvalInputRecorder but left BC import in as an option.

Test Plan:

python test_gptq.py

note1: the skipped test test_gptq_quantizer_int4_weight_only also ran.
note2: we now have a CI ready test in test_gptq using the generic input recorder.

I verified that all activation match between old GPTQ (non MT) and current
GPTQ, this can be seen by passing the test_gptq_quantizer_int4_weight_only as mentioned above but was also verified by comparing debug outputs and printing activation values for first 3 multi tensors.
eval benchmarks:

export CHECKPOINT_PATH=../../../checkpoints # path to checkpoints folder

export MODEL_REPO=meta-llama/Llama-2-7b-chat-hf
python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth
python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth
--quantization int4wo-64
python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth
--quantization int4wo-gptq-64 --calibration_limit 10

export MODEL_REPO=meta-llama/Meta-Llama-3-8B
python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth
python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth
--quantization int4wo-64
python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth
--quantization int4wo-gptq-64 --calibration_limit 10

see README.md for results but they show GPTQ is working

Reviewers:

Subscribers:

Tasks:

Tags:

pytorch-bot · 2025-05-21T22:26:02Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2235

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 5bab3ab with merge base c4250a4 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torchao/quantization/qat/linear.py

torchao/quantization/linear_quant_modules.py

torchao/quantization/README.md

torchao/prototype/GPTQ/README.md

torchao/quantization/quant_api.py

torchao/_models/_eval.py

test/quantization/test_gptq.py

Summary: 1) reorganized GPTQ a) got rid of old GPTQ and renamed GPTQ_MT to GPTQ b) moved new GPTQ to prototype c) moved quantized linear modules in GPTQ.py to linear_quant_modules.py 2) removed dependence on lm_eval for input_recorder a) created new input recorder that doesn't depend on lm_eval b) made lm_eval input recorder depend on new generic input_recorder c) made TransformerEvalWrapper the base class and made d) updated apis generally to work with new input recorder LMEvalInputRecorder inherit from it instead of vice-versa 3) reorganized GPTQ tests a) moved tests from test_quant_api.py to test_gptq.py b) added new test that can be run in CI that doesn't depend on lm_eval/llama weights c) got rid of test_gptq_mt.py 4) added new documentation for lm_eval 5) GPTQ improvements a) reimplemented faster quant b) tested compilation of hessian calculation and parts of faster quant, generally they were slower. c) moved helper functions out of the class. They're largely generic and this is less cluttered. d) some improvements to the duplication checking and copying to be faster when possible e) fixed some bugs due to this not being in CI and things changing for int4wo tensor subclass. Test Plan: 1) `python test_gptq.py` note: the skipped test test_gptq_quantizer_int4_weight_only also ran. 2) I verified that all activation match between old GPTQ and current GPTQ 3) ```shell export CHECKPOINT_PATH=../../../checkpoints # path to checkpoints folder export MODEL_REPO=meta-llama/Llama-2-7b-chat-hf python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int4wo-64 python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int4wo-gptq-64 --calibration_limit 10 export MODEL_REPO=meta-llama/Meta-Llama-3-8B python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int4wo-64 python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int4wo-gptq-64 --calibration_limit 10 ``` see README.md for results but they show GPTQ is working Reviewers: Subscribers: Tasks: Tags:

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

jerryzh168 · 2025-05-30T22:30:18Z

torchao/_models/_eval.py

+    def get_recorded_inputs(self):
+        return self.base_input_recorder.get_recorded_inputs()
+
+    def get_recorded_args_and_kwargs(self):
+        return self.base_input_recorder.get_recorded_args_and_kwargs()


nit: get_recorded_args(self) might be more consistent with get_recorded_args_and_kwargs?

does recorded inputs support recording multiple args?

i wanted to maintain the old API which is used in a few places just to get data.

torchao/optim/adam.py

jerryzh168 · 2025-05-30T22:51:02Z

torchao/quantization/GPTQ/README.md

+# first gather inputs
+input_recorder = MultiTensorInputRecorder()
+for i in range(calibration_limit):
+    args = get_next_input() # user provided function


optional nit: might be useful to have some real data example (e.g.

ao/torchao/prototype/awq/example.py

Line 21 in c4250a4

dataset = load_dataset("mit-han-lab/pile-val-backup", split="validation")

), but I'm also planning to take a look at this soon

jerryzh168

Thanks, I think the updated API looks good

jerryzh168 · 2025-05-30T22:54:42Z

torchao/quantization/GPTQ/README.md

+     # note: can do input_recorder(*args, **kwargs) if needed
+
+# then perform GPTQ
+quantizer = Int4WeightOnlyGPTQQuantizer() # quantization parameters like group_size can be set here


also maybe adding some example on how people can adapt this to their own quantization might be useful as well I think

torchao/quantization/qat/linear.py

jerryzh168 · 2025-05-30T22:56:54Z

torchao/quantization/__init__.py

    "Int4WeightOnlyQuantizer",
-    "Int8DynActInt4WeightGPTQQuantizer",


this is still used in ET (code): https://github.com/pytorch/executorch/blob/66dfc46686ebcb317efdf13580e869e7e4e2f0cc/examples/models/llama/source_transformation/quantize.py#L184 so probably better to remove the this in ET before landing the PR

see pytorch/executorch#11274

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

This is being deprecated in torchao pytorch/ao#2235

HDCharles requested review from jerryzh168 and andrewor14 May 21, 2025 22:26

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 21, 2025

HDCharles added the topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) label May 21, 2025

HDCharles force-pushed the 098_gptq branch from 25550b1 to 7aa8726 Compare May 22, 2025 04:13

andrewor14 reviewed May 22, 2025

View reviewed changes

HDCharles force-pushed the 098_gptq branch 11 times, most recently from 956e16c to 4b44d67 Compare May 23, 2025 13:41

HDCharles force-pushed the 098_gptq branch from 4b44d67 to e8bd341 Compare May 30, 2025 19:35

HDCharles force-pushed the 098_gptq branch from e8bd341 to 0d374d7 Compare May 30, 2025 19:36

HDCharles added 2 commits May 30, 2025 14:13

checking if this fixes the tests

83379fe

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

trying to fix the adam stuff now

4e5d77c

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

jerryzh168 reviewed May 30, 2025

View reviewed changes

torchao/optim/adam.py Outdated Show resolved Hide resolved

jerryzh168 mentioned this pull request May 30, 2025

Enable Int4WeightOnlyGPTQQuantizer on Intel GPU. #2200

Open

jerryzh168 reviewed May 30, 2025

View reviewed changes

jerryzh168 approved these changes May 30, 2025

View reviewed changes

jerryzh168 reviewed May 30, 2025

View reviewed changes

torchao/quantization/qat/linear.py Outdated Show resolved Hide resolved

jerryzh168 reviewed May 30, 2025

View reviewed changes

HDCharles added 4 commits May 30, 2025 17:44

fix wanda error

2f5fd5b

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

fix adam attempt

e4b1ca9

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

fix CI

bb9132b

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

figured out issue i think

5bab3ab

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

gau-nernst mentioned this pull request May 31, 2025

[optim] Fix bug when default dtype is BF16 #2286

Open

HDCharles added a commit to HDCharles/executorch that referenced this pull request May 31, 2025

removing 8da4w-gptq

efbd90d

This is being deprecated in torchao pytorch/ao#2235

HDCharles mentioned this pull request May 31, 2025

removing 8da4w-gptq pytorch/executorch#11274

Open

HDCharles merged commit f0f1f6c into main Jun 2, 2025
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GPTQ updates #2235

GPTQ updates #2235

HDCharles commented May 21, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented May 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jerryzh168 May 30, 2025

Uh oh!

HDCharles May 31, 2025

Uh oh!

Uh oh!

jerryzh168 May 30, 2025

Uh oh!

jerryzh168 left a comment

Uh oh!

jerryzh168 May 30, 2025

Uh oh!

Uh oh!

jerryzh168 May 30, 2025

Uh oh!

HDCharles May 31, 2025

Uh oh!

Uh oh!

Uh oh!

		"Int4WeightOnlyQuantizer",
		"Int8DynActInt4WeightGPTQQuantizer",

GPTQ updates #2235

GPTQ updates #2235

Conversation

HDCharles commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2235

✅ No Failures

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jerryzh168 May 30, 2025

Choose a reason for hiding this comment

Uh oh!

HDCharles May 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jerryzh168 May 30, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 left a comment

Choose a reason for hiding this comment

Uh oh!

jerryzh168 May 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jerryzh168 May 30, 2025

Choose a reason for hiding this comment

Uh oh!

HDCharles May 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

HDCharles commented May 21, 2025 •

edited

Loading

pytorch-bot bot commented May 21, 2025 •

edited

Loading