-
Notifications
You must be signed in to change notification settings - Fork 275
Enable AWQ on Intel GPU. #2248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
+51
−44
Closed
Enable AWQ on Intel GPU. #2248
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
24fc0bd
Enable AWQ on Intel GPU.
xiaowangintel 8ac3b07
Enable AWQ on Intel GPU.
xiaowangintel 51c2ebb
Enable AWQ on Intel GPU.
xiaowangintel cbd74d8
Enable AWQ on Intel GPU.
xiaowangintel bf59893
Enable AWQ on Intel GPU.
xiaowangintel cc0f29f
Enable AWQ on Intel GPU.
xiaowangintel 86a23d7
Enable AWQ on Intel GPU.
xiaowangintel 45e545e
Enable AWQ on Intel GPU.
xiaowangintel 4f11061
Enable AWQ on Intel GPU.
xiaowangintel f60041c
Enable AWQ on Intel GPU.
xiaowangintel File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -127,6 +127,11 @@ def cuda(self): | |
val.cuda() if isinstance(val, torch.Tensor) else val for val in self.values | ||
] | ||
|
||
def xpu(self): | ||
self.values = [ | ||
val.xpu() if isinstance(val, torch.Tensor) else val for val in self.values | ||
] | ||
|
||
|
||
def guard_dtype_size(tensor_arg, arg_name, dtype=None, size=None): | ||
if dtype is not None and tensor_arg.dtype != dtype: | ||
|
@@ -415,25 +420,6 @@ def unpack_tinygemm_scales_and_zeros(scales_and_zeros): | |
return torch.split(scales_and_zeros.transpose(-3, -2), 1, -1) | ||
|
||
|
||
def convert_weight_to_int4pack_xpu(weight, zero_point_domain_is_int=False): | ||
assert weight.device.type == "xpu" | ||
|
||
if zero_point_domain_is_int: | ||
# int_data = weight.to(dtype=torch.uint8) | ||
int_data = (weight[::, 1::2] << 4 | weight[::, ::2]).to(torch.uint8) | ||
packed_weight = torch.ops.aten._convert_weight_to_int4pack( | ||
int_data, | ||
8, # TODO:remove | ||
) | ||
else: | ||
out = weight.to(dtype=torch.uint8) | ||
out = (out[::, 1::2] << 4 | out[::, ::2]).to(torch.uint8) | ||
packed_weight = out.view(torch.int32) | ||
|
||
# Second, N * K/2 uint8 -> N * K/8 int32 | ||
return packed_weight | ||
|
||
|
||
def groupwise_affine_quantize_tensor_from_qparams( | ||
w, scales, zeros, n_bit=4, groupsize=128, zero_point_domain=ZeroPointDomain.FLOAT | ||
): | ||
|
@@ -473,6 +459,8 @@ def groupwise_affine_quantize_tensor_from_qparams( | |
not (check_xpu_version(int_data.device)) | ||
): | ||
int_data = (int_data[::, ::2] << 4 | int_data[::, 1::2]).to(torch.uint8) | ||
if check_xpu_version(int_data.device): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should probably encapsulate these better when we have a better design for layout conversions: #2249 |
||
int_data = (int_data[::, 1::2] << 4 | int_data[::, ::2]).to(torch.uint8) | ||
return int_data | ||
|
||
|
||
|
@@ -491,7 +479,6 @@ def groupwise_affine_dequantize_tensor_from_qparams( | |
TORCH_VERSION_AT_LEAST_2_5 | ||
and (w_int4x8.dtype == torch.uint8 or w_int4x8.shape[-1] > 1) | ||
and not (check_cpu_version(w_int4x8.device)) | ||
and not (check_xpu_version(w_int4x8.device)) | ||
): | ||
data = w_int4x8.to(torch.int32) | ||
high_bits = data >> 4 | ||
|
@@ -501,8 +488,12 @@ def groupwise_affine_dequantize_tensor_from_qparams( | |
dtype=torch.int32, | ||
device=w_int4x8.device, | ||
) | ||
w_int32[::, ::2] = high_bits | ||
w_int32[::, 1::2] = low_bits | ||
if not (check_xpu_version(w_int4x8.device)): | ||
w_int32[::, ::2] = high_bits | ||
w_int32[::, 1::2] = low_bits | ||
else: | ||
w_int32[::, ::2] = low_bits | ||
w_int32[::, 1::2] = high_bits | ||
else: | ||
w_int32 = w_int4x8 | ||
|
||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is actually specific to fbgemm I think, we'd need to rename in a future PR cc @jainapurva