Initial ParetoQ commit #1876

andrewor14 · 2025-03-12T20:14:05Z

This project contains the training code of ParetoQ introduced in: "ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization" (https://arxiv.org/abs/2502.02631). All code is written by @liuzechun and @zxdmike and migrated from
https://github.com/facebookresearch/ParetoQ.

ParetoQ is the first unified framework that facilitates rigorous comparisons across 1-bit, 1.58-bit, 2-bit, 3-bit, and 4-bit quantization settings. By optimizing training schemes and refining quantization functions, ParetoQ surpasses all previous methods tailored to specific bit widths. Specifically, the 1.58-bit ParetoQ LLaMA-3 8B model reduces the performance gap to full precision by relatively 37.8% compared to the 1-bit Era’s 1.58-bit LLaMA-3 8B model, while using only 30% of the training tokens.

pytorch-bot · 2025-03-12T20:14:09Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1876

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 77b1bcc with merge base 8c81863 ():

NEW FAILURE - The following job has failed:

Run TorchAO Experimental Tests / test-mps-ops (macos-m1-stable) (gh)
ModuleNotFoundError: No module named 'importlib_metadata'

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@liuzechun

This project contains the training code of ParetoQ introduced in: "ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization" (https://arxiv.org/abs/2502.02631). All code is written by @liuzechun and @zxdmike and migrated from https://github.com/facebookresearch/ParetoQ. ParetoQ is the first unified framework that facilitates rigorous comparisons across 1-bit, 1.58-bit, 2-bit, 3-bit, and 4-bit quantization settings. By optimizing training schemes and refining quantization functions, ParetoQ surpasses all previous methods tailored to specific bit widths. Specifically, the 1.58-bit ParetoQ LLaMA-3 8B model reduces the performance gap to full precision by relatively 37.8% compared to the 1-bit Era’s 1.58-bit LLaMA-3 8B model, while using only 30% of the training tokens.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 12, 2025

andrewor14 marked this pull request as draft March 12, 2025 20:14

andrewor14 force-pushed the paretoq branch from ade3706 to 39d0119 Compare March 12, 2025 21:01

andrewor14 added the topic: new feature Use this tag if this PR adds a new feature label Mar 12, 2025

andrewor14 force-pushed the paretoq branch from 39d0119 to 35597f5 Compare March 13, 2025 20:30

andrewor14 marked this pull request as ready for review March 13, 2025 20:31

andrewor14 force-pushed the paretoq branch from 35597f5 to 29400c6 Compare March 13, 2025 20:32

andrewor14 force-pushed the paretoq branch from 29400c6 to 77b1bcc Compare March 14, 2025 16:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial ParetoQ commit #1876

Initial ParetoQ commit #1876

andrewor14 commented Mar 12, 2025

pytorch-bot bot commented Mar 12, 2025 •

edited

Loading

Initial ParetoQ commit #1876

Are you sure you want to change the base?

Initial ParetoQ commit #1876

Conversation

andrewor14 commented Mar 12, 2025

pytorch-bot bot commented Mar 12, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1876

❌ 1 New Failure

pytorch-bot bot commented Mar 12, 2025 •

edited

Loading