Export fp32 to qint4 #3526

gkisalapl · 2025-10-17T15:11:34Z

Dependency of the PR

Commits to be reviewed in this PR

{commit-1}

{commit message}

Self evaluation:

Build test: [X]Passed [ ]Failed [ ]Skipped
Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: {your_name} <{your_email}>

{commit-2}

{commit message}

Self evaluation:

Build test: [X]Passed [ ]Failed [ ]Skipped
Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: {your_name} <{your_email}>

Summary

{Summary of PR 1}
{Summary of PR 2}

Signed-off-by: {your_name} <{your_email}>

Fix problem of group size incosistency Signed-off-by: Grzegorz Kisala/Neural Computing (AIS) /SRPOL/Senior Professional/Samsung Electronics <[email protected]>

Signed-off-by: Grzegorz Kisala/Neural Computing (AIS) /SRPOL/Senior Professional/Samsung Electronics <[email protected]>

**Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Maciej Nalewaj <[email protected]>

Signed-off-by: Grzegorz Kisala/Neural Computing (AIS) /SRPOL/Senior Professional/Samsung Electronics <[email protected]>

djeong20 · 2025-10-20T15:25:04Z

nntrainer/models/neuralnet.cpp

        std::vector<uint8_t> quantized_weights_int4;
        std::vector<uint16_t> quantized_scales_int4;
-        Int4Utils::quantizeAndRepack(dequantized_weights_q4.data(), N, K,
+        Int4Utils::quantizeAndRepack((float *)(file_view + start_from), N, K,


FP32 data should be transposed before it is quantized. Otherwise, it would produce an incorrect output.

djeong20 · 2025-10-20T15:26:14Z

nntrainer/models/neuralnet.cpp

+
+        std::cout << "New size: " << size << std::endl;
+      } else {
+        output.write(file_view + start_from, size);


In the case of tie word embedding, you need to filter out the LM head case and not save it.

gkisalapl and others added 11 commits October 6, 2025 12:45

INT4 quantization fixes

b1ecb4f

Fix problem of group size incosistency Signed-off-by: Grzegorz Kisala/Neural Computing (AIS) /SRPOL/Senior Professional/Samsung Electronics <[email protected]>

Move int4 logic to separate file and fix scale order

04ee578

Signed-off-by: Grzegorz Kisala/Neural Computing (AIS) /SRPOL/Senior Professional/Samsung Electronics <[email protected]>

Fix linux build

9e5b7c1

Signed-off-by: Grzegorz Kisala/Neural Computing (AIS) /SRPOL/Senior Professional/Samsung Electronics <[email protected]>

Add INT4 dequantization

634fd5a

Signed-off-by: Grzegorz Kisala/Neural Computing (AIS) /SRPOL/Senior Professional/Samsung Electronics <[email protected]>

Add Q4_0 dequantization

ca3a51f

Signed-off-by: Grzegorz Kisala/Neural Computing (AIS) /SRPOL/Senior Professional/Samsung Electronics <[email protected]>

Calculate mse for gemm

c3811ac

Signed-off-by: Grzegorz Kisala/Neural Computing (AIS) /SRPOL/Senior Professional/Samsung Electronics <[email protected]>

[gemm cl] Fix for reading scales in int4 gemm cl kernel

df56f41

**Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Maciej Nalewaj <[email protected]>

Q4_0 -> Fp32 -> INT4 experiments

061d8a0

Signed-off-by: Grzegorz Kisala/Neural Computing (AIS) /SRPOL/Senior Professional/Samsung Electronics <[email protected]>

Replace int4 gemvs wirh gemm

18f2c55

Signed-off-by: Grzegorz Kisala/Neural Computing (AIS) /SRPOL/Senior Professional/Samsung Electronics <[email protected]>

Save model repacking inflight

e21a59a

Export FP32 layers to QINT4

9401f09

Signed-off-by: Grzegorz Kisala/Neural Computing (AIS) /SRPOL/Senior Professional/Samsung Electronics <[email protected]>

github-actions bot added the Need Review label Oct 17, 2025

djeong20 requested changes Oct 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Export fp32 to qint4 #3526

Export fp32 to qint4 #3526

Uh oh!

gkisalapl commented Oct 17, 2025

Uh oh!

djeong20 Oct 20, 2025

Uh oh!

djeong20 Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Export fp32 to qint4 #3526

Are you sure you want to change the base?

Export fp32 to qint4 #3526

Uh oh!

Conversation

gkisalapl commented Oct 17, 2025

Dependency of the PR

Commits to be reviewed in this PR

Summary

Uh oh!

djeong20 Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

djeong20 Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants