Skip to content

Feat (brevitas_examples/sdxl): better GPTQ #1250

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: dev
Choose a base branch
from

Conversation

Giuseppe5
Copy link
Collaborator

Reason for this PR

Few shot calibration + GPTQ

Some details need refining

Changes Made in this PR

Testing Summary

Risk Highlight

  • This PR includes code from another work (please detail).
  • This PR contains API-breaking changes.
  • This PR depends on work in another PR (please provide links/details).
  • This PR introduces new dependencies (please detail).
  • There are coverage gaps not covered by tests.
  • Documentation updates required in subsequent PR.

Checklist

  • Code comments added to any hard-to-understand areas, if applicable.
  • Changes generate no new warnings.
  • Updated any relevant tests, if applicable.
  • No conflicts with destination dev branch.
  • I reviewed my own code changes.
  • Initial CI/CD passing.
  • 1+ reviews given, and any review issues addressed and approved.
  • Post-review full CI/CD passing.

@nickfraser nickfraser added enhancement New feature or request next release PRs which should be merged for the next release labels Apr 11, 2025
@Giuseppe5 Giuseppe5 marked this pull request as ready for review April 14, 2025 08:30
(self.groups, self.columns, self.columns),
device='cpu',
dtype=torch.float32,
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I should re-introduce this but I was getting weird CUDA errors. Also, since we are storing in CPU, not sure why we need to use pin_memory.
@i-colbert maybe you can comment before we merge

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can remember, it was used to improve data transfer speeds from GPU to CPU, which is why it is only enabled if a GPU is available.

Copy link
Collaborator

@nickfraser nickfraser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments / question, but in principle, approved.

@@ -242,13 +242,26 @@ def single_layer_update(self):
pass

def get_quant_weights(self, i, i1, permutation_list, with_quant_history=False):
from brevitas.quant_tensor import _unpack_quant_tensor
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not required - already imported.

@@ -252,16 +251,21 @@ def main(args):
# Extend seeds based on batch_size
test_seeds = [TEST_SEED] + [TEST_SEED + i for i in range(1, args.batch_size)]

is_flux_model = 'flux' in args.model.lower()
is_flux = 'flux' in args.model.lower()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a possible more generic name for this flag? I see this applying to more models than just flux...

create_weight_orig=False,
use_quant_activations=False,
return_forward_output=True,
with torch.no_grad(), quant_inference_mode(denoising_network, compile=args.compile_eval):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we always want quant_inference_mode to be applied?

test_latents=latents,
guidance_scale=args.guidance_scale,
is_unet=is_unet)
with torch.no_grad(), quant_inference_mode(denoising_network, compile=args.compile_eval):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we always want quant_inference_mode to be applied?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request next release PRs which should be merged for the next release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants