Skip to content

Task02 Вадим Козлов ИТМО#1050

Open
six-nine wants to merge 1 commit intoGPGPUCourse:task02from
six-nine:task02
Open

Task02 Вадим Козлов ИТМО#1050
six-nine wants to merge 1 commit intoGPGPUCourse:task02from
six-nine:task02

Conversation

@six-nine
Copy link

@six-nine six-nine commented Feb 27, 2026

Локальный вывод

Found 1 GPUs in 0.0563472 sec (OpenCL: 0.0560523 sec, Vulkan: 0.000277708 sec)
Available devices:
  Device #0: API: OpenCL. GPU. Apple M2 Pro. Total memory: 21845 Mb.
Using device #0: API: OpenCL. GPU. Apple M2 Pro. Total memory: 21845 Mb.
Using OpenCL API...
______________________________________________________
Evaluating algorithm #1/3: CPU
algorithm times (in seconds) - 1 values (min=4.04523 10%=4.04523 median=4.04523 90%=4.04523 max=4.04523)
Mandelbrot effective algorithm GFlops: 2.47205 GFlops
saving image to 'mandelbrot CPU.bmp'...
CPU vs CPU average results difference: 0%
______________________________________________________
Evaluating algorithm #2/3: CPU with OpenMP
OpenMP threads: x1 threads
algorithm times (in seconds) - 10 values (min=4.03083 10%=4.04194 median=4.07077 90%=4.36813 max=4.36813)
Mandelbrot effective algorithm GFlops: 2.45654 GFlops
saving image to 'mandelbrot CPU with OpenMP.bmp'...
CPU with OpenMP vs CPU average results difference: 0%
______________________________________________________
Evaluating algorithm #3/3: GPU
Kernels compilation done in 0.00112417 seconds
algorithm times (in seconds) - 10 values (min=0.00350913 10%=0.00351038 median=0.0042125 90%=0.0114245 max=0.0114245)
Mandelbrot effective algorithm GFlops: 2373.89 GFlops
saving image to 'mandelbrot GPU.bmp'...
GPU vs CPU average results difference: 0%



Found 1 GPUs in 0.0590232 sec (OpenCL: 0.0586815 sec, Vulkan: 0.000322458 sec)
Available devices:
  Device #0: API: OpenCL. GPU. Apple M2 Pro. Total memory: 21845 Mb.
Using device #0: API: OpenCL. GPU. Apple M2 Pro. Total memory: 21845 Mb.
Using OpenCL API...
PCI-E upload times - 3 values (min=0.0114513 10%=0.0114513 median=0.0123617 90%=0.0149615 max=0.0149615) s
PCI-E upload median bandwidth: 30.1357 GB/s
______________________________________________________
Evaluating algorithm #1/6: CPU
algorithm times (in seconds) - 10 values (min=0.0860901 10%=0.0861774 median=0.086524 90%=0.0930874 max=0.0930874)
sum median effective algorithm bandwidth: 4.3055 GB/s
______________________________________________________
Evaluating algorithm #2/6: CPU with OpenMP
algorithm times (in seconds) - 10 values (min=0.087182 10%=0.0875196 median=0.0876988 90%=0.0903481 max=0.0903481)
sum median effective algorithm bandwidth: 4.24782 GB/s
______________________________________________________
Evaluating algorithm #3/6: 01 atomicAdd from each workItem
Kernels compilation done in 0.000804542 seconds
algorithm times (in seconds) - 10 values (min=0.00347688 10%=0.00350458 median=0.00394238 90%=0.00899633 max=0.00899633)
sum median effective algorithm bandwidth: 94.4936 GB/s
______________________________________________________
Evaluating algorithm #4/6: 02 atomicAdd but each workItem loads K values
Kernels compilation done in 0.000235917 seconds
algorithm times (in seconds) - 10 values (min=0.00247858 10%=0.00249233 median=0.00353888 90%=0.00498863 max=0.00498863)
sum median effective algorithm bandwidth: 105.268 GB/s
______________________________________________________
Evaluating algorithm #5/6: 03 local memory and atomicAdd from master thread
Kernels compilation done in 0.000235542 seconds
algorithm times (in seconds) - 10 values (min=0.00512037 10%=0.00512192 median=0.00526167 90%=0.0150483 max=0.0150483)
sum median effective algorithm bandwidth: 70.8006 GB/s
______________________________________________________
Evaluating algorithm #6/6: 04 local reduction
Kernels compilation done in 0.000288208 seconds
algorithm times (in seconds) - 10 values (min=0.0218695 10%=0.021886 median=0.021994 90%=0.0279082 max=0.0279082)
sum median effective algorithm bandwidth: 16.9377 GB/s

Вывод Github CI

TBD

@GPUcourseBOT
Copy link
Collaborator

Результаты тестирования PR #1050

Логи тестирования (нажмите чтобы развернуть)
=== СТАТУС: Успешно выполнены программы: main_mandelbrot, main_sum ===
=== main_mandelbrot stdout (exit code: -11 (segfault после выполнения)) ===
Found 1 GPUs in 0.441178 sec (CUDA: 0.125479 sec, OpenCL: 0.038727 sec, Vulkan: 0.276906 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using OpenCL API...
______________________________________________________
Evaluating algorithm #1/3: CPU
algorithm times (in seconds) - 1 values (min=3.39177 10%=3.39177 median=3.39177 90%=3.39177 max=3.39177)
Mandelbrot effective algorithm GFlops: 2.94831 GFlops
saving image to 'mandelbrot CPU.bmp'...
CPU vs CPU average results difference: 0%
______________________________________________________
Evaluating algorithm #2/3: CPU with OpenMP
OpenMP threads: x4 threads
algorithm times (in seconds) - 10 values (min=1.04592 10%=1.05219 median=1.05937 90%=1.06138 max=1.06138)
Mandelbrot effective algorithm GFlops: 9.43959 GFlops
saving image to 'mandelbrot CPU with OpenMP.bmp'...
CPU with OpenMP vs CPU average results difference: 0%
______________________________________________________
Evaluating algorithm #3/3: GPU
Kernels compilation done in 0.105703 seconds
algorithm times (in seconds) - 10 values (min=0.0042755 10%=0.00427583 median=0.00428431 90%=0.110075 max=0.110075)
Mandelbrot effective algorithm GFlops: 2334.1 GFlops
saving image to 'mandelbrot GPU.bmp'...
GPU vs CPU average results difference: 0.942446%
=== main_sum stdout (exit code: -11 (segfault после выполнения)) ===
Found 1 GPUs in 0.327674 sec (CUDA: 0.12706 sec, OpenCL: 0.0395904 sec, Vulkan: 0.160953 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using OpenCL API...
PCI-E upload times - 3 values (min=0.0430661 10%=0.0430661 median=0.0431566 90%=0.0435457 max=0.0435457) s
PCI-E upload median bandwidth: 8.63204 GB/s
______________________________________________________
Evaluating algorithm #1/6: CPU
algorithm times (in seconds) - 10 values (min=0.0359217 10%=0.0363713 median=0.036688 90%=0.0371545 max=0.0371545)
sum median effective algorithm bandwidth: 10.154 GB/s
______________________________________________________
Evaluating algorithm #2/6: CPU with OpenMP
algorithm times (in seconds) - 10 values (min=0.0166356 10%=0.0167114 median=0.0169305 90%=0.0171807 max=0.0171807)
sum median effective algorithm bandwidth: 22.0035 GB/s
______________________________________________________
Evaluating algorithm #3/6: 01 atomicAdd from each workItem
Kernels compilation done in 0.226108 seconds
algorithm times (in seconds) - 10 values (min=0.00478161 10%=0.00478214 median=0.00478301 90%=0.231003 max=0.231003)
sum median effective algorithm bandwidth: 77.8858 GB/s
______________________________________________________
Evaluating algorithm #4/6: 02 atomicAdd but each workItem loads K values
Kernels compilation done in 0.184404 seconds
algorithm times (in seconds) - 10 values (min=0.00253565 10%=0.00253643 median=0.00253742 90%=0.187099 max=0.187099)
sum median effective algorithm bandwidth: 146.814 GB/s
______________________________________________________
Evaluating algorithm #5/6: 03 local memory and atomicAdd from master thread
Kernels compilation done in 0.22123 seconds
algorithm times (in seconds) - 10 values (min=0.00903914 10%=0.00904023 median=0.00904172 90%=0.230382 max=0.230382)
sum median effective algorithm bandwidth: 41.2011 GB/s
______________________________________________________
Evaluating algorithm #6/6: 04 local reduction
Kernels compilation done in 0.350192 seconds
algorithm times (in seconds) - 10 values (min=0.0138063 10%=0.0138096 median=0.030991 90%=0.381292 max=0.381292)
sum median effective algorithm bandwidth: 12.0206 GB/s

Посмотреть полные логи

@GPUcourseBOT
Copy link
Collaborator

Результаты тестирования PR #1050

Логи тестирования (нажмите чтобы развернуть)
=== СТАТУС: Успешно выполнены программы: main_mandelbrot, main_sum ===
=== main_mandelbrot stdout (exit code: -11 (segfault после выполнения)) ===
Found 1 GPUs in 0.294488 sec (CUDA: 0.121125 sec, OpenCL: 0.0377544 sec, Vulkan: 0.135549 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using OpenCL API...
______________________________________________________
Evaluating algorithm #1/3: CPU
algorithm times (in seconds) - 1 values (min=3.27859 10%=3.27859 median=3.27859 90%=3.27859 max=3.27859)
Mandelbrot effective algorithm GFlops: 3.05009 GFlops
saving image to 'mandelbrot CPU.bmp'...
CPU vs CPU average results difference: 0%
______________________________________________________
Evaluating algorithm #2/3: CPU with OpenMP
OpenMP threads: x4 threads
algorithm times (in seconds) - 10 values (min=0.990938 10%=1.00536 median=1.03266 90%=1.05578 max=1.05578)
Mandelbrot effective algorithm GFlops: 9.68371 GFlops
saving image to 'mandelbrot CPU with OpenMP.bmp'...
CPU with OpenMP vs CPU average results difference: 0%
______________________________________________________
Evaluating algorithm #3/3: GPU
Kernels compilation done in 0.0609532 seconds
algorithm times (in seconds) - 10 values (min=0.00427339 10%=0.00427459 median=0.00427743 90%=0.0652848 max=0.0652848)
Mandelbrot effective algorithm GFlops: 2337.85 GFlops
saving image to 'mandelbrot GPU.bmp'...
GPU vs CPU average results difference: 0.942446%
=== main_sum stdout (exit code: -11 (segfault после выполнения)) ===
Found 1 GPUs in 0.320605 sec (CUDA: 0.127663 sec, OpenCL: 0.037677 sec, Vulkan: 0.155209 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using OpenCL API...
PCI-E upload times - 3 values (min=0.0411775 10%=0.0411775 median=0.0417159 90%=0.0420663 max=0.0420663) s
PCI-E upload median bandwidth: 8.93015 GB/s
______________________________________________________
Evaluating algorithm #1/6: CPU
algorithm times (in seconds) - 10 values (min=0.0352076 10%=0.0353851 median=0.0358452 90%=0.0362777 max=0.0362777)
sum median effective algorithm bandwidth: 10.3927 GB/s
______________________________________________________
Evaluating algorithm #2/6: CPU with OpenMP
algorithm times (in seconds) - 10 values (min=0.0159521 10%=0.0159914 median=0.0161745 90%=0.0165719 max=0.0165719)
sum median effective algorithm bandwidth: 23.0319 GB/s
______________________________________________________
Evaluating algorithm #3/6: 01 atomicAdd from each workItem
Kernels compilation done in 0.0793156 seconds
algorithm times (in seconds) - 10 values (min=0.00275226 10%=0.0027528 median=0.00275443 90%=0.0821867 max=0.0821867)
sum median effective algorithm bandwidth: 135.247 GB/s
______________________________________________________
Evaluating algorithm #4/6: 02 atomicAdd but each workItem loads K values
Kernels compilation done in 0.0627053 seconds
algorithm times (in seconds) - 10 values (min=0.00253486 10%=0.00253515 median=0.00253728 90%=0.0653479 max=0.0653479)
sum median effective algorithm bandwidth: 146.822 GB/s
______________________________________________________
Evaluating algorithm #5/6: 03 local memory and atomicAdd from master thread
Kernels compilation done in 0.0517637 seconds
algorithm times (in seconds) - 10 values (min=0.00694707 10%=0.00694734 median=0.00694844 90%=0.058565 max=0.058565)
sum median effective algorithm bandwidth: 53.6133 GB/s
______________________________________________________
Evaluating algorithm #6/6: 04 local reduction
Kernels compilation done in 0.0447847 seconds
algorithm times (in seconds) - 10 values (min=0.0149859 10%=0.0149881 median=0.0290293 90%=0.0738994 max=0.0738994)
sum median effective algorithm bandwidth: 12.8329 GB/s

Посмотреть полные логи

@GPUcourseBOT
Copy link
Collaborator

Результаты тестирования PR #1050

Логи тестирования (нажмите чтобы развернуть)
=== СТАТУС: Успешно выполнены программы: main_mandelbrot, main_sum ===
=== main_mandelbrot stdout (exit code: -11 (segfault после выполнения)) ===
Found 1 GPUs in 0.284734 sec (CUDA: 0.122128 sec, OpenCL: 0.0372252 sec, Vulkan: 0.125324 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using OpenCL API...
______________________________________________________
Evaluating algorithm #1/3: CPU
algorithm times (in seconds) - 1 values (min=3.19382 10%=3.19382 median=3.19382 90%=3.19382 max=3.19382)
Mandelbrot effective algorithm GFlops: 3.13105 GFlops
saving image to 'mandelbrot CPU.bmp'...
CPU vs CPU average results difference: 0%
______________________________________________________
Evaluating algorithm #2/3: CPU with OpenMP
OpenMP threads: x4 threads
algorithm times (in seconds) - 10 values (min=0.987109 10%=0.987283 median=0.997729 90%=1.01509 max=1.01509)
Mandelbrot effective algorithm GFlops: 10.0228 GFlops
saving image to 'mandelbrot CPU with OpenMP.bmp'...
CPU with OpenMP vs CPU average results difference: 0%
______________________________________________________
Evaluating algorithm #3/3: GPU
Kernels compilation done in 0.0578308 seconds
algorithm times (in seconds) - 10 values (min=0.00427533 10%=0.00427645 median=0.00428027 90%=0.0621622 max=0.0621622)
Mandelbrot effective algorithm GFlops: 2336.3 GFlops
saving image to 'mandelbrot GPU.bmp'...
GPU vs CPU average results difference: 0.942446%
=== main_sum stdout (exit code: -11 (segfault после выполнения)) ===
Found 1 GPUs in 0.323488 sec (CUDA: 0.127436 sec, OpenCL: 0.0402388 sec, Vulkan: 0.155747 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using OpenCL API...
PCI-E upload times - 3 values (min=0.041737 10%=0.041737 median=0.0417655 90%=0.0418615 max=0.0418615) s
PCI-E upload median bandwidth: 8.91954 GB/s
______________________________________________________
Evaluating algorithm #1/6: CPU
algorithm times (in seconds) - 10 values (min=0.035438 10%=0.0354611 median=0.0357794 90%=0.0360533 max=0.0360533)
sum median effective algorithm bandwidth: 10.4118 GB/s
______________________________________________________
Evaluating algorithm #2/6: CPU with OpenMP
algorithm times (in seconds) - 10 values (min=0.0162572 10%=0.0162689 median=0.0164169 90%=0.0168448 max=0.0168448)
sum median effective algorithm bandwidth: 22.6918 GB/s
______________________________________________________
Evaluating algorithm #3/6: 01 atomicAdd from each workItem
Kernels compilation done in 0.0548277 seconds
algorithm times (in seconds) - 10 values (min=0.00283585 10%=0.00283637 median=0.00283778 90%=0.0577678 max=0.0577678)
sum median effective algorithm bandwidth: 131.275 GB/s
______________________________________________________
Evaluating algorithm #4/6: 02 atomicAdd but each workItem loads K values
Kernels compilation done in 0.0445153 seconds
algorithm times (in seconds) - 10 values (min=0.00253516 10%=0.00253564 median=0.00253636 90%=0.0471701 max=0.0471701)
sum median effective algorithm bandwidth: 146.876 GB/s
______________________________________________________
Evaluating algorithm #5/6: 03 local memory and atomicAdd from master thread
Kernels compilation done in 0.0537203 seconds
algorithm times (in seconds) - 10 values (min=0.00736819 10%=0.00736856 median=0.0073707 90%=0.0611933 max=0.0611933)
sum median effective algorithm bandwidth: 50.5419 GB/s
______________________________________________________
Evaluating algorithm #6/6: 04 local reduction
Kernels compilation done in 0.0536568 seconds
algorithm times (in seconds) - 10 values (min=0.0131542 10%=0.01318 median=0.0317062 90%=0.0840484 max=0.0840484)
sum median effective algorithm bandwidth: 11.7494 GB/s

Посмотреть полные логи

@GPUcourseBOT
Copy link
Collaborator

Результаты тестирования PR #1050

Логи тестирования (нажмите чтобы развернуть)
=== СТАТУС: Успешно выполнены программы: main_mandelbrot, main_sum ===
=== main_mandelbrot stdout (exit code: -11 (segfault после выполнения)) ===
Found 1 GPUs in 0.300292 sec (CUDA: 0.123019 sec, OpenCL: 0.0382075 sec, Vulkan: 0.139006 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using OpenCL API...
______________________________________________________
Evaluating algorithm #1/3: CPU
algorithm times (in seconds) - 1 values (min=3.21191 10%=3.21191 median=3.21191 90%=3.21191 max=3.21191)
Mandelbrot effective algorithm GFlops: 3.11342 GFlops
saving image to 'mandelbrot CPU.bmp'...
CPU vs CPU average results difference: 0%
______________________________________________________
Evaluating algorithm #2/3: CPU with OpenMP
OpenMP threads: x4 threads
algorithm times (in seconds) - 10 values (min=0.998009 10%=1.00105 median=1.02332 90%=1.05365 max=1.05365)
Mandelbrot effective algorithm GFlops: 9.7721 GFlops
saving image to 'mandelbrot CPU with OpenMP.bmp'...
CPU with OpenMP vs CPU average results difference: 0%
______________________________________________________
Evaluating algorithm #3/3: GPU
Kernels compilation done in 0.0533816 seconds
algorithm times (in seconds) - 10 values (min=0.00427339 10%=0.00427684 median=0.00428078 90%=0.0577146 max=0.0577146)
Mandelbrot effective algorithm GFlops: 2336.02 GFlops
saving image to 'mandelbrot GPU.bmp'...
GPU vs CPU average results difference: 0.942446%
=== main_sum stdout (exit code: -11 (segfault после выполнения)) ===
Found 1 GPUs in 0.296002 sec (CUDA: 0.1289 sec, OpenCL: 0.0374102 sec, Vulkan: 0.129638 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using OpenCL API...
PCI-E upload times - 3 values (min=0.0412848 10%=0.0412848 median=0.0414234 90%=0.0417029 max=0.0417029) s
PCI-E upload median bandwidth: 8.9932 GB/s
______________________________________________________
Evaluating algorithm #1/6: CPU
algorithm times (in seconds) - 10 values (min=0.034578 10%=0.0346804 median=0.0350704 90%=0.0356046 max=0.0356046)
sum median effective algorithm bandwidth: 10.6223 GB/s
______________________________________________________
Evaluating algorithm #2/6: CPU with OpenMP
algorithm times (in seconds) - 10 values (min=0.0156988 10%=0.0157769 median=0.0162231 90%=0.0165231 max=0.0165231)
sum median effective algorithm bandwidth: 22.9628 GB/s
______________________________________________________
Evaluating algorithm #3/6: 01 atomicAdd from each workItem
Kernels compilation done in 0.0552275 seconds
algorithm times (in seconds) - 10 values (min=0.00275223 10%=0.00275238 median=0.00275404 90%=0.0580801 max=0.0580801)
sum median effective algorithm bandwidth: 135.266 GB/s
______________________________________________________
Evaluating algorithm #4/6: 02 atomicAdd but each workItem loads K values
Kernels compilation done in 0.0409813 seconds
algorithm times (in seconds) - 10 values (min=0.00146241 10%=0.0014625 median=0.001463 90%=0.0425418 max=0.0425418)
sum median effective algorithm bandwidth: 254.634 GB/s
______________________________________________________
Evaluating algorithm #5/6: 03 local memory and atomicAdd from master thread
Kernels compilation done in 0.0447164 seconds
algorithm times (in seconds) - 10 values (min=0.00682616 10%=0.00703404 median=0.00708198 90%=0.0516351 max=0.0516351)
sum median effective algorithm bandwidth: 52.6024 GB/s
______________________________________________________
Evaluating algorithm #6/6: 04 local reduction
Kernels compilation done in 0.0538434 seconds
algorithm times (in seconds) - 10 values (min=0.0142349 10%=0.0142363 median=0.0296506 90%=0.0835936 max=0.0835936)
sum median effective algorithm bandwidth: 12.564 GB/s

Посмотреть полные логи

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants