Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subclass API (#966) #995

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

metascroy
Copy link
Contributor

Summary:

Adds new int8_dynamic_activation_intx_weight quantization with subclass API

Differential Revision: D62464487

Summary:

Adds new int8_dynamic_activation_intx_weight quantization with subclass API

Differential Revision: D62464487
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 2, 2024
Copy link

pytorch-bot bot commented Oct 2, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/995

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 41a40cb with merge base 09b8b3c (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link

This pull request was exported from Phabricator. Differential Revision: D62464487

@@ -300,7 +300,7 @@ def _quantize_affine_no_dtype_cast(
elif zero_point_domain is None:
# This case handles quantization for float8 we expect no zero point and no zero point domain
assert zero_point is None, "zero_point should be None when zero_point_domain is None"
quant = torch.clamp(input * scale.reciprocal(), quant_min, quant_max)
quant = torch.clamp(torch.round(input * (1.0 / scale)), quant_min, quant_max)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jerryzh168 to confirm if this is OK. It was needed to match behavior of other quantizer.

if preserve_zero:
zero_point = quant_min - torch.round(min_val_neg / scale)
zero_point = torch.clamp(zero_point, quant_min, quant_max)
if zero_point_domain is None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jerryzh168 confirm if this is OK. It was needed to get scale-only quantization in affine_quantized_tensor

exported = torch.export.export(model, (activations,))

print("Compiling quantized model")
compiled = torch.compile(unwrapped_model)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jerryzh168 do you see unification for compile and export coming soon? The fact that one requires an unwrapped tensor subclass and the other requires a wrapped one makes using this API inconvenient in torchchat.

@metascroy
Copy link
Contributor Author

@kimishpatel @jerryzh168 moving review over to GH. I hope I've addressed most of your concerns.

@jerryzh168, the fact that compile and export cannot handle the same model (export requires an unwrapped tensor subclass, compile requires a wrapped one, and eager can handle both) makes using this API inconvenient in torchchat. Do you know if there is planned unification there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants