-
Notifications
You must be signed in to change notification settings - Fork 220
Feat (brevitas_examples/sdxl): better GPTQ #1250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
(self.groups, self.columns, self.columns), | ||
device='cpu', | ||
dtype=torch.float32, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I should re-introduce this but I was getting weird CUDA errors. Also, since we are storing in CPU, not sure why we need to use pin_memory.
@i-colbert maybe you can comment before we merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can remember, it was used to improve data transfer speeds from GPU to CPU, which is why it is only enabled if a GPU is available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments / question, but in principle, approved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Reason for this PR
Few shot calibration + GPTQ
Some details need refining
Changes Made in this PR
Testing Summary
Risk Highlight
Checklist
dev
branch.