-
Notifications
You must be signed in to change notification settings - Fork 94
Export fp32 to qint4 #3526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Export fp32 to qint4 #3526
Conversation
Fix problem of group size incosistency Signed-off-by: Grzegorz Kisala/Neural Computing (AIS) /SRPOL/Senior Professional/Samsung Electronics <[email protected]>
Signed-off-by: Grzegorz Kisala/Neural Computing (AIS) /SRPOL/Senior Professional/Samsung Electronics <[email protected]>
Signed-off-by: Grzegorz Kisala/Neural Computing (AIS) /SRPOL/Senior Professional/Samsung Electronics <[email protected]>
Signed-off-by: Grzegorz Kisala/Neural Computing (AIS) /SRPOL/Senior Professional/Samsung Electronics <[email protected]>
Signed-off-by: Grzegorz Kisala/Neural Computing (AIS) /SRPOL/Senior Professional/Samsung Electronics <[email protected]>
Signed-off-by: Grzegorz Kisala/Neural Computing (AIS) /SRPOL/Senior Professional/Samsung Electronics <[email protected]>
**Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Maciej Nalewaj <[email protected]>
Signed-off-by: Grzegorz Kisala/Neural Computing (AIS) /SRPOL/Senior Professional/Samsung Electronics <[email protected]>
Signed-off-by: Grzegorz Kisala/Neural Computing (AIS) /SRPOL/Senior Professional/Samsung Electronics <[email protected]>
Signed-off-by: Grzegorz Kisala/Neural Computing (AIS) /SRPOL/Senior Professional/Samsung Electronics <[email protected]>
| std::vector<uint8_t> quantized_weights_int4; | ||
| std::vector<uint16_t> quantized_scales_int4; | ||
| Int4Utils::quantizeAndRepack(dequantized_weights_q4.data(), N, K, | ||
| Int4Utils::quantizeAndRepack((float *)(file_view + start_from), N, K, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FP32 data should be transposed before it is quantized. Otherwise, it would produce an incorrect output.
|
|
||
| std::cout << "New size: " << size << std::endl; | ||
| } else { | ||
| output.write(file_view + start_from, size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of tie word embedding, you need to filter out the LM head case and not save it.
Dependency of the PR
Commits to be reviewed in this PR
{commit-1}
{commit message}
Self evaluation:
Signed-off-by: {your_name} <{your_email}>
{commit-2}
{commit message}
Self evaluation:
Signed-off-by: {your_name} <{your_email}>
Summary
Signed-off-by: {your_name} <{your_email}>