-
Notifications
You must be signed in to change notification settings - Fork 94
INT4 quantization fixes #3521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
gkisalapl
wants to merge
6
commits into
nnstreamer:main
Choose a base branch
from
gkisalapl:int4_fixes
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
INT4 quantization fixes #3521
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Fix problem of group size incosistency Signed-off-by: Grzegorz Kisala/Neural Computing (AIS) /SRPOL/Senior Professional/Samsung Electronics <[email protected]>
|
Test run results for me: mwlasiuk@AMDN5757:~/code/nntrainer-mw/build$ ./test/unittest/unittest_blas_kernels_cl
[==========] Running 48 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 31 tests from nntrainer_blas_kernel
[ RUN ] nntrainer_blas_kernel.int4_gemv_test_3072_256_Group32
INT4 GEMV : 3072 x 256
- time : GPU = 1.02 ms
- sample : [4116.000 -112.938 -142.500 -8.398 34.156 ][135.250 -77.625 -98.250 -148.375 60.781 ]
- q4_0 : [4072.853 -111.854 -145.012 -7.019 27.722 ][146.417 -71.207 -98.288 -147.315 62.149 ]
[ OK ] nntrainer_blas_kernel.int4_gemv_test_3072_256_Group32 (3710 ms)
[ RUN ] nntrainer_blas_kernel.int4_gemv_test_3072_8192_Group32
INT4 GEMV : 3072 x 8192
- time : GPU = 0.270 ms
- sample : [4120.000 -116.938 -140.125 -5.289 35.938 ][84.312 -46.094 -25.766 6.012 37.250 ]
- q4_0 : [4072.853 -111.854 -145.012 -7.019 27.722 ][74.400 -41.083 -28.113 8.804 32.963 ]
[ OK ] nntrainer_blas_kernel.int4_gemv_test_3072_8192_Group32 (369 ms)
[ RUN ] nntrainer_blas_kernel.int4_gemv_test_8192_3072_Group32
INT4 GEMV : 8192 x 3072
- time : GPU = 0.320 ms
- sample : [10912.000 37.812 38.188 -136.125 -85.500 ][-166.250 163.750 73.188 15.016 -118.625 ]
- q4_0 : [10804.084 39.441 36.997 -136.982 -79.085 ][-167.248 173.781 70.513 -12.012 -107.630 ]
[ OK ] nntrainer_blas_kernel.int4_gemv_test_8192_3072_Group32 (355 ms)
[ RUN ] nntrainer_blas_kernel.int4_gemv_test_3072_3072_Group32
INT4 GEMV : 3072 x 3072
- time : GPU = 0.140 ms
- sample : [4120.000 -109.250 -140.250 -5.672 30.812 ][28.766 -47.844 -73.000 -7.762 93.938 ]
- q4_0 : [4072.853 -111.854 -145.012 -7.019 27.722 ][23.053 -52.539 -71.790 3.962 95.171 ]
[ OK ] nntrainer_blas_kernel.int4_gemv_test_3072_3072_Group32 (127 ms)
[ RUN ] nntrainer_blas_kernel.int4_gemv_test_3072_256_Group128
INT4 GEMV : 3072 x 256
- time : GPU = 1.125 ms
- sample : [4128.000 -108.125 -140.125 -8.594 31.641 ][137.000 -80.375 -94.188 -149.750 55.531 ]
- q4_0 : [4072.853 -111.854 -145.012 -7.019 27.722 ][146.417 -71.207 -98.288 -147.315 62.149 ]
[ OK ] nntrainer_blas_kernel.int4_gemv_test_3072_256_Group128 (231 ms)
[ RUN ] nntrainer_blas_kernel.int4_gemv_test_3072_8192_Group128
INT4 GEMV : 3072 x 8192
- time : GPU = 0.360 ms
- sample : [4132.000 -109.000 -141.750 -7.996 30.312 ][80.500 -54.000 -23.703 6.586 36.000 ]
- q4_0 : [4072.853 -111.854 -145.012 -7.019 27.722 ][74.400 -41.083 -28.113 8.804 32.963 ]
[ OK ] nntrainer_blas_kernel.int4_gemv_test_3072_8192_Group128 (324 ms)
[ RUN ] nntrainer_blas_kernel.int4_gemv_test_8192_3072_Group128
INT4 GEMV : 8192 x 3072
- time : GPU = 0.300 ms
- sample : [10936.000 36.812 39.344 -124.688 -84.312 ][-160.375 153.250 72.312 -5.715 -121.125 ]
- q4_0 : [10804.084 39.441 36.997 -136.982 -79.085 ][-167.248 173.781 70.513 -12.012 -107.630 ]
[ OK ] nntrainer_blas_kernel.int4_gemv_test_8192_3072_Group128 (308 ms)
[ RUN ] nntrainer_blas_kernel.int4_gemv_test_3072_3072_Group128
INT4 GEMV : 3072 x 3072
- time : GPU = 0.185 ms
- sample : [4136.000 -108.250 -142.000 -7.715 31.359 ][16.391 -64.250 -88.688 -9.812 96.125 ]
- q4_0 : [4072.853 -111.854 -145.012 -7.019 27.722 ][23.053 -52.539 -71.790 3.962 95.171 ]
[ OK ] nntrainer_blas_kernel.int4_gemv_test_3072_3072_Group128 (131 ms)
[ RUN ] nntrainer_blas_kernel.tensor_dot_qint4
[-0.479 -0.397 12.817 -12.101 -0.601 ][-22.812 9.726 10.446 -5.906 -26.202 ]
[0.252 -2.254 13.766 -14.500 -0.816 ][-26.438 9.953 9.227 -5.723 -25.094 ]
[ OK ] nntrainer_blas_kernel.tensor_dot_qint4 (419 ms)
[ RUN ] nntrainer_blas_kernel.int4_sgemv_test_3072_256_32
INT4 GEMV : 3072 x 256
- time : GPU = 0.105 ms
- sample : [4116.000 -112.938 -142.500 -8.398 34.156 ][135.250 -77.625 -98.250 -148.375 60.781 ]
[ OK ] nntrainer_blas_kernel.int4_sgemv_test_3072_256_32 (26 ms)
[ RUN ] nntrainer_blas_kernel.int4_sgemv_test_3072_3072_128
INT4 GEMV : 3072 x 3072
- time : GPU = 0.190 ms
- sample : [4136.000 -108.250 -142.000 -7.715 31.359 ][16.391 -64.250 -88.688 -9.812 96.125 ]
[ OK ] nntrainer_blas_kernel.int4_sgemv_test_3072_3072_128 (94 ms)
[ RUN ] nntrainer_blas_kernel.int4_gemv_async_test
INT4 GEMV : 1 x 3072 x 256
- time : Orig = 0.345 ms
- out0 : [21.516 -27.703 -14.727 -7.719 3.457 ][-31.641 -28.859 1.175 30.266 -21.391 ]
- out1 : [-5.141 -0.058 -2.008 -21.969 -9.188 ][11.664 -5.965 16.375 9.711 6.594 ]
- out2 : [-5.445 3.430 -10.477 -15.242 -31.625 ][-8.188 -10.273 -13.961 -1.292 30.047 ]
- time : Async = 0.175 ms
- out0 : [21.516 -27.703 -14.727 -7.719 3.457 ][-31.641 -28.859 1.175 30.266 -21.391 ]
- out1 : [-5.141 -0.058 -2.008 -21.969 -9.188 ][11.664 -5.965 16.375 9.711 6.594 ]
- out2 : [-5.445 3.430 -10.477 -15.242 -31.625 ][-8.188 -10.273 -13.961 -1.292 30.047 ]
[ OK ] nntrainer_blas_kernel.int4_gemv_async_test (167 ms)
[ RUN ] nntrainer_blas_kernel.int4_gemm_test_68_3072_256_Group32
INT4 GEMM : 68 x 3072 x 256
- time : GPU = 0.550 ms
- sample : GPU = [1030.000 -26.297 -36.094 -1.657 8.336 -11.836 ][-19.594 5.004 17.422 15.836 33.312 -9.031 ]
- sample: REF = [1027.516 -28.516 -35.092 -1.065 6.170 -10.831 ][-18.172 5.180 15.674 16.029 34.337 -7.664 ]
[ OK ] nntrainer_blas_kernel.int4_gemm_test_68_3072_256_Group32 (324 ms)
[ RUN ] nntrainer_blas_kernel.int4_gemm_test_68_3072_8192_Group32
INT4 GEMM : 68 x 3072 x 8192
- time : GPU = 4.550 ms
- sample : GPU = [1030.000 -27.594 -34.031 -2.070 8.227 -12.641 ][-6.148 -10.836 -60.469 18.266 0.034 -22.031 ]
- sample: REF = [1027.516 -28.516 -35.092 -1.065 6.170 -10.831 ][-6.959 -10.102 -61.701 17.299 0.510 -22.822 ]
[ OK ] nntrainer_blas_kernel.int4_gemm_test_68_3072_8192_Group32 (1286 ms)
[ RUN ] nntrainer_blas_kernel.int4_gemm_test_68_8192_3072_Group32
INT4 GEMM : 68 x 8192 x 3072
- time : GPU = 4.660 ms
- sample : GPU = [2728.000 10.195 9.695 -34.562 -20.578 -35.625 ][-34.438 49.375 14.352 -32.250 -31.500 -44.969 ]
- sample: REF = [2724.568 10.725 9.949 -34.227 -23.554 -34.950 ][-33.758 49.724 17.005 -32.704 -34.106 -44.059 ]
[ OK ] nntrainer_blas_kernel.int4_gemm_test_68_8192_3072_Group32 (1721 ms)
[ RUN ] nntrainer_blas_kernel.int4_gemm_test_68_3072_3072_Group32
INT4 GEMM : 68 x 3072 x 3072
- time : GPU = 2.515 ms
- sample : GPU = [1030.000 -27.938 -35.031 -1.961 8.734 -12.672 ][-18.094 -9.719 2.053 2.930 12.992 -22.656 ]
- sample: REF = [1027.516 -28.516 -35.092 -1.065 6.170 -10.831 ][-19.174 -9.252 3.564 3.885 13.320 -22.298 ]
[ OK ] nntrainer_blas_kernel.int4_gemm_test_68_3072_3072_Group32 (792 ms)
[ RUN ] nntrainer_blas_kernel.int4_gemm_test_68_3072_256_Group128
INT4 GEMM : 68 x 3072 x 256
- time : GPU = 0.620 ms
- sample : GPU = [1032.000 -27.141 -35.344 -2.090 7.719 -11.359 ][-17.750 4.203 16.375 15.352 35.500 -8.258 ]
- sample: REF = [1027.516 -28.516 -35.092 -1.065 6.170 -10.831 ][-18.172 5.180 15.674 16.029 34.337 -7.664 ]
[ OK ] nntrainer_blas_kernel.int4_gemm_test_68_3072_256_Group128 (345 ms)
[ RUN ] nntrainer_blas_kernel.int4_gemm_test_68_3072_8192_Group128
INT4 GEMM : 68 x 3072 x 8192
- time : GPU = 2.905 ms
- sample : GPU = [1032.000 -26.797 -35.031 -2.158 7.973 -11.281 ][-5.184 -9.766 -60.781 18.016 0.360 -25.109 ]
- sample: REF = [1027.516 -28.516 -35.092 -1.065 6.170 -10.831 ][-6.959 -10.102 -61.701 17.299 0.510 -22.822 ]
[ OK ] nntrainer_blas_kernel.int4_gemm_test_68_3072_8192_Group128 (969 ms)
[ RUN ] nntrainer_blas_kernel.int4_gemm_test_68_8192_3072_Group128
INT4 GEMM : 68 x 8192 x 3072
- time : GPU = 3.070 ms
- sample : GPU = [2732.000 8.711 9.562 -31.250 -20.844 -35.844 ][-33.500 46.250 14.891 -33.719 -33.656 -46.938 ]
- sample: REF = [2724.568 10.725 9.949 -34.227 -23.554 -34.950 ][-33.758 49.724 17.005 -32.704 -34.106 -44.059 ]
[ OK ] nntrainer_blas_kernel.int4_gemm_test_68_8192_3072_Group128 (1292 ms)
[ RUN ] nntrainer_blas_kernel.int4_gemm_test_68_3072_3072_Group128
INT4 GEMM : 68 x 3072 x 3072
- time : GPU = 1.310 ms
- sample : GPU = [1032.000 -27.250 -35.344 -1.772 7.648 -11.562 ][-20.219 -8.891 1.174 3.814 11.922 -20.625 ]
- sample: REF = [1027.516 -28.516 -35.092 -1.065 6.170 -10.831 ][-19.174 -9.252 3.564 3.885 13.320 -22.298 ]
[ OK ] nntrainer_blas_kernel.int4_gemm_test_68_3072_3072_Group128 (531 ms)
[ RUN ] nntrainer_blas_kernel.int4_gemm_async_test
Segmentation fault (core dumped)
mwlasiuk@AMDN5757:~/code/nntrainer-mw/build$ Under debugger just hung for 15 minutes and I terminated process with ctrl-c: mwlasiuk@AMDN5757:~/code/nntrainer-mw/build$ gdb test/unittest/unittest_blas_kernels_cl
GNU gdb (Ubuntu 15.0.50.20240403-0ubuntu1) 15.0.50.20240403-git
Copyright (C) 2024 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from test/unittest/unittest_blas_kernels_cl...
This GDB supports auto-downloading debuginfo from the following URLs:
<https://debuginfod.ubuntu.com>
Enable debuginfod for this session? (y or [n]) y
Debuginfod has been enabled.
To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit.
Downloading separate debug info for /home/mwlasiuk/code/nntrainer-mw/build/test/unittest/unittest_blas_kernels_cl
(No debugging symbols found in test/unittest/unittest_blas_kernels_cl)
(gdb) r
Starting program: /home/mwlasiuk/code/nntrainer-mw/build/test/unittest/unittest_blas_kernels_cl
Downloading separate debug info for system-supplied DSO at 0x7ffff7fc3000
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Downloading separate debug info for /home/mwlasiuk/code/nntrainer-mw/build/test/unittest/../../nntrainer/../subprojects/CLBlast/libclblast.so
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblas.so.0
warning: could not find '.gnu_debugaltlink' file for /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblas.so.0
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblas.so.0
[New Thread 0x7ffff1dff6c0 (LWP 92151)]
[New Thread 0x7fffe95fe6c0 (LWP 92152)]
[New Thread 0x7fffe0dfd6c0 (LWP 92153)]
[New Thread 0x7fffe05fc6c0 (LWP 92154)]
[New Thread 0x7fffd7dfb6c0 (LWP 92155)]
[New Thread 0x7fffc75fa6c0 (LWP 92156)]
[New Thread 0x7fffbedf96c0 (LWP 92157)]
[New Thread 0x7fffb65f86c0 (LWP 92158)]
[New Thread 0x7fffaddf76c0 (LWP 92159)]
[New Thread 0x7fffa55f66c0 (LWP 92160)]
[New Thread 0x7fff9cdf56c0 (LWP 92161)]
[New Thread 0x7fff945f46c0 (LWP 92162)]
[New Thread 0x7fff8bdf36c0 (LWP 92163)]
[New Thread 0x7fff8b5f26c0 (LWP 92164)]
[New Thread 0x7fff7adf16c0 (LWP 92165)]
[New Thread 0x7fff725f06c0 (LWP 92166)]
[New Thread 0x7fff69def6c0 (LWP 92167)]
[New Thread 0x7fff695ee6c0 (LWP 92168)]
[New Thread 0x7fff60ded6c0 (LWP 92169)]
[==========] Running 48 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 31 tests from nntrainer_blas_kernel
[ RUN ] nntrainer_blas_kernel.int4_gemv_test_3072_256_Group32
Downloading separate debug info for /home/mwlasiuk/intel/oneapi/2025.2/lib/libhwloc.so.15
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so
Downloading separate debug info for /usr/lib/wsl/drivers/oem131.inf_amd64_0a4a1d29d2918a75/libwsl_compute_helper.so
Downloading separate debug info for /lib/x86_64-linux-gnu/libigdfcl.so.1
Downloading separate debug info for /lib/x86_64-linux-gnu/libLLVM-14.so.1
Downloading separate debug info for /lib/x86_64-linux-gnu/libLLVMSPIRVLib.so.14
Downloading separate debug info for /lib/x86_64-linux-gnu/libclang-cpp.so.14
Downloading separate debug info for /lib/x86_64-linux-gnu/libedit.so.2
Downloading separate debug info for /lib/x86_64-linux-gnu/libtinfo.so.6
warning: could not find '.gnu_debugaltlink' file for /lib/x86_64-linux-gnu/libtinfo.so.6
Downloading separate debug info for /lib/x86_64-linux-gnu/libtinfo.so.6
Downloading separate debug info for /lib/x86_64-linux-gnu/libxml2.so.2
Downloading separate debug info for /lib/x86_64-linux-gnu/libbsd.so.0
Downloading separate debug info for /lib/x86_64-linux-gnu/liblzma.so.5
Downloading separate debug info for /lib/x86_64-linux-gnu/libmd.so.0
Downloading separate debug info for /lib/x86_64-linux-gnu/libigc.so.1
Downloading separate debug info for /lib/x86_64-linux-gnu/libva.so.2
[New Thread 0x5fff309ff6c0 (LWP 92222)]
[New Thread 0x5fff1d0f16c0 (LWP 92228)]
[New Thread 0x5fff1c8f06c0 (LWP 92229)]
[New Thread 0x5fff1c0ef6c0 (LWP 92230)]
[New Thread 0x5fff1b8ee6c0 (LWP 92231)]
[New Thread 0x5fff1b0ed6c0 (LWP 92232)]
[New Thread 0x5fff1a8ec6c0 (LWP 92233)]
[New Thread 0x5fff1a0eb6c0 (LWP 92234)]
[New Thread 0x5fff198ea6c0 (LWP 92235)]
[New Thread 0x5fff190e96c0 (LWP 92236)]
[New Thread 0x5fff188e86c0 (LWP 92237)]
[New Thread 0x5fff180e76c0 (LWP 92238)]
[New Thread 0x5fff178e66c0 (LWP 92239)]
[New Thread 0x5fff170e56c0 (LWP 92240)]
[New Thread 0x5fff168e46c0 (LWP 92241)]
[New Thread 0x5fff160e36c0 (LWP 92242)]
[New Thread 0x5fff158e26c0 (LWP 92243)]
[New Thread 0x5fff150e16c0 (LWP 92244)]
[New Thread 0x5fff148e06c0 (LWP 92245)]
[New Thread 0x5fff140df6c0 (LWP 92246)]
[New Thread 0x5fff138de6c0 (LWP 92247)]
INT4 GEMV : 3072 x 256
- time : GPU = 0.15 ms
- sample : [4116.000 -112.938 -142.500 -8.398 34.156 ][135.250 -77.625 -98.250 -148.375 60.781 ]
- q4_0 : [4072.853 -111.854 -145.012 -7.019 27.722 ][146.417 -71.207 -98.288 -147.315 62.149 ]
[ OK ] nntrainer_blas_kernel.int4_gemv_test_3072_256_Group32 (1612 ms)
[ RUN ] nntrainer_blas_kernel.int4_gemv_test_3072_8192_Group32
INT4 GEMV : 3072 x 8192
- time : GPU = 0.330 ms
- sample : [4120.000 -116.938 -140.125 -5.289 35.938 ][84.312 -46.094 -25.766 6.012 37.250 ]
- q4_0 : [4072.853 -111.854 -145.012 -7.019 27.722 ][74.400 -41.083 -28.113 8.804 32.963 ]
[ OK ] nntrainer_blas_kernel.int4_gemv_test_3072_8192_Group32 (364 ms)
[ RUN ] nntrainer_blas_kernel.int4_gemv_test_8192_3072_Group32
INT4 GEMV : 8192 x 3072
- time : GPU = 0.295 ms
- sample : [10912.000 37.812 38.188 -136.125 -85.500 ][-166.250 163.750 73.188 15.016 -118.625 ]
- q4_0 : [10804.084 39.441 36.997 -136.982 -79.085 ][-167.248 173.781 70.513 -12.012 -107.630 ]
[ OK ] nntrainer_blas_kernel.int4_gemv_test_8192_3072_Group32 (309 ms)
[ RUN ] nntrainer_blas_kernel.int4_gemv_test_3072_3072_Group32
INT4 GEMV : 3072 x 3072
- time : GPU = 0.160 ms
- sample : [4120.000 -109.250 -140.250 -5.672 30.812 ][28.766 -47.844 -73.000 -7.762 93.938 ]
- q4_0 : [4072.853 -111.854 -145.012 -7.019 27.722 ][23.053 -52.539 -71.790 3.962 95.171 ]
[ OK ] nntrainer_blas_kernel.int4_gemv_test_3072_3072_Group32 (125 ms)
[ RUN ] nntrainer_blas_kernel.int4_gemv_test_3072_256_Group128
INT4 GEMV : 3072 x 256
- time : GPU = 0.120 ms
- sample : [4128.000 -108.125 -140.125 -8.594 31.641 ][137.000 -80.375 -94.188 -149.750 55.531 ]
- q4_0 : [4072.853 -111.854 -145.012 -7.019 27.722 ][146.417 -71.207 -98.288 -147.315 62.149 ]
[ OK ] nntrainer_blas_kernel.int4_gemv_test_3072_256_Group128 (31 ms)
[ RUN ] nntrainer_blas_kernel.int4_gemv_test_3072_8192_Group128
INT4 GEMV : 3072 x 8192
- time : GPU = 0.405 ms
- sample : [4132.000 -109.000 -141.750 -7.996 30.312 ][80.500 -54.000 -23.703 6.586 36.000 ]
- q4_0 : [4072.853 -111.854 -145.012 -7.019 27.722 ][74.400 -41.083 -28.113 8.804 32.963 ]
[ OK ] nntrainer_blas_kernel.int4_gemv_test_3072_8192_Group128 (322 ms)
[ RUN ] nntrainer_blas_kernel.int4_gemv_test_8192_3072_Group128
INT4 GEMV : 8192 x 3072
- time : GPU = 0.350 ms
- sample : [10936.000 36.812 39.344 -124.688 -84.312 ][-160.375 153.250 72.312 -5.715 -121.125 ]
- q4_0 : [10804.084 39.441 36.997 -136.982 -79.085 ][-167.248 173.781 70.513 -12.012 -107.630 ]
[ OK ] nntrainer_blas_kernel.int4_gemv_test_8192_3072_Group128 (322 ms)
[ RUN ] nntrainer_blas_kernel.int4_gemv_test_3072_3072_Group128
INT4 GEMV : 3072 x 3072
- time : GPU = 0.210 ms
- sample : [4136.000 -108.250 -142.000 -7.715 31.359 ][16.391 -64.250 -88.688 -9.812 96.125 ]
- q4_0 : [4072.853 -111.854 -145.012 -7.019 27.722 ][23.053 -52.539 -71.790 3.962 95.171 ]
[ OK ] nntrainer_blas_kernel.int4_gemv_test_3072_3072_Group128 (135 ms)
[ RUN ] nntrainer_blas_kernel.tensor_dot_qint4
[-0.479 -0.397 12.817 -12.101 -0.601 ][-22.812 9.726 10.446 -5.906 -26.202 ]
[0.252 -2.254 13.766 -14.500 -0.816 ][-26.438 9.953 9.227 -5.723 -25.094 ]
[ OK ] nntrainer_blas_kernel.tensor_dot_qint4 (202 ms)
[ RUN ] nntrainer_blas_kernel.int4_sgemv_test_3072_256_32
INT4 GEMV : 3072 x 256
- time : GPU = 0.220 ms
- sample : [4116.000 -112.938 -142.500 -8.398 34.156 ][135.250 -77.625 -98.250 -148.375 60.781 ]
[ OK ] nntrainer_blas_kernel.int4_sgemv_test_3072_256_32 (63 ms)
[ RUN ] nntrainer_blas_kernel.int4_sgemv_test_3072_3072_128
INT4 GEMV : 3072 x 3072
- time : GPU = 500.225 ms
- sample : [0.000 0.000 0.000 0.000 0.000 ][0.000 0.000 0.000 0.000 0.000 ]
[ OK ] nntrainer_blas_kernel.int4_sgemv_test_3072_3072_128 (100103 ms)
[ RUN ] nntrainer_blas_kernel.int4_gemv_async_test
^C
Thread 1 "unittest_blas_k" received signal SIGINT, Interrupt.
Download failed: Invalid argument. Continuing without source file ./posix/../sysdeps/unix/syscall-template.S.
0x00007ffff6b0e80b in __GI_sched_yield () at ../sysdeps/unix/syscall-template.S:120
warning: 120 ../sysdeps/unix/syscall-template.S: No such file or directory
(gdb) bt
#0 0x00007ffff6b0e80b in __GI_sched_yield () at ../sysdeps/unix/syscall-template.S:120
#1 0x00007fff409c126d in ?? () from /usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so
#2 0x00007fff4088b3ee in ?? () from /usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so
#3 0x00007fff405137e9 in ?? () from /usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so
#4 0x00007fff40517fd3 in ?? () from /usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so
#5 0x00007fff406cbabd in ?? () from /usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so
#6 0x00007fff406fa237 in ?? () from /usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so
#7 0x00007fff404c5f09 in ?? () from /usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so
#8 0x00007ffff76e7e8e in nntrainer::opencl::CommandQueueManager::enqueueSVMMap(void*, unsigned long, bool, _cl_event**) () from /home/mwlasiuk/code/nntrainer-mw/build/test/unittest/../../nntrainer/libnntrainer.so
#9 0x00007ffff7627037 in nntrainer::gemv_int4_async_cl(std::vector<void*, std::allocator<void*> >, std::vector<unsigned short*, std::allocator<unsigned short*> >, unsigned short*, std::vector<unsigned short*, std::allocator<unsigned short*> >, unsigned int, std::vector<unsigned int, std::allocator<unsigned int> >, unsigned int) () from /home/mwlasiuk/code/nntrainer-mw/build/test/unittest/../../nntrainer/libnntrainer.so
#10 0x000055555558eb4b in nntrainer_blas_kernel_int4_gemv_async_test_Test::TestBody() ()
#11 0x00005555555f17bf in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ()
#12 0x00005555555d88d6 in testing::Test::Run() ()
#13 0x00005555555d8a95 in testing::TestInfo::Run() ()
#14 0x00005555555d8c7f in testing::TestSuite::Run() ()
#15 0x00005555555e6aec in testing::internal::UnitTestImpl::RunAllTests() ()
#16 0x00005555555f1e97 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ()
#17 0x00005555555d8e78 in testing::UnitTest::Run() ()
#18 0x000055555557b004 in main ()
(gdb) Values for non failing (crashing) tests seem to be legit |
Signed-off-by: Grzegorz Kisala/Neural Computing (AIS) /SRPOL/Senior Professional/Samsung Electronics <[email protected]>
Signed-off-by: Grzegorz Kisala/Neural Computing (AIS) /SRPOL/Senior Professional/Samsung Electronics <[email protected]>
Signed-off-by: Grzegorz Kisala/Neural Computing (AIS) /SRPOL/Senior Professional/Samsung Electronics <[email protected]>
Signed-off-by: Grzegorz Kisala/Neural Computing (AIS) /SRPOL/Senior Professional/Samsung Electronics <[email protected]>
Signed-off-by: Grzegorz Kisala/Neural Computing (AIS) /SRPOL/Senior Professional/Samsung Electronics <[email protected]>
|
This PR is stale because it has been open 14 days with no activity. Remove stale label or comment or this will be closed in 3 days. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix problem of group size incosistency