Open
Conversation
Collaborator
|
✅ Результаты тестирования PR #1050 Логи тестирования (нажмите чтобы развернуть)=== СТАТУС: Успешно выполнены программы: main_mandelbrot, main_sum === === main_mandelbrot stdout (exit code: -11 (segfault после выполнения)) === Found 1 GPUs in 0.441178 sec (CUDA: 0.125479 sec, OpenCL: 0.038727 sec, Vulkan: 0.276906 sec) Available devices: Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb. Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb. Using OpenCL API... ______________________________________________________ Evaluating algorithm #1/3: CPU algorithm times (in seconds) - 1 values (min=3.39177 10%=3.39177 median=3.39177 90%=3.39177 max=3.39177) Mandelbrot effective algorithm GFlops: 2.94831 GFlops saving image to 'mandelbrot CPU.bmp'... CPU vs CPU average results difference: 0% ______________________________________________________ Evaluating algorithm #2/3: CPU with OpenMP OpenMP threads: x4 threads algorithm times (in seconds) - 10 values (min=1.04592 10%=1.05219 median=1.05937 90%=1.06138 max=1.06138) Mandelbrot effective algorithm GFlops: 9.43959 GFlops saving image to 'mandelbrot CPU with OpenMP.bmp'... CPU with OpenMP vs CPU average results difference: 0% ______________________________________________________ Evaluating algorithm #3/3: GPU Kernels compilation done in 0.105703 seconds algorithm times (in seconds) - 10 values (min=0.0042755 10%=0.00427583 median=0.00428431 90%=0.110075 max=0.110075) Mandelbrot effective algorithm GFlops: 2334.1 GFlops saving image to 'mandelbrot GPU.bmp'... GPU vs CPU average results difference: 0.942446% === main_sum stdout (exit code: -11 (segfault после выполнения)) === Found 1 GPUs in 0.327674 sec (CUDA: 0.12706 sec, OpenCL: 0.0395904 sec, Vulkan: 0.160953 sec) Available devices: Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb. Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb. Using OpenCL API... PCI-E upload times - 3 values (min=0.0430661 10%=0.0430661 median=0.0431566 90%=0.0435457 max=0.0435457) s PCI-E upload median bandwidth: 8.63204 GB/s ______________________________________________________ Evaluating algorithm #1/6: CPU algorithm times (in seconds) - 10 values (min=0.0359217 10%=0.0363713 median=0.036688 90%=0.0371545 max=0.0371545) sum median effective algorithm bandwidth: 10.154 GB/s ______________________________________________________ Evaluating algorithm #2/6: CPU with OpenMP algorithm times (in seconds) - 10 values (min=0.0166356 10%=0.0167114 median=0.0169305 90%=0.0171807 max=0.0171807) sum median effective algorithm bandwidth: 22.0035 GB/s ______________________________________________________ Evaluating algorithm #3/6: 01 atomicAdd from each workItem Kernels compilation done in 0.226108 seconds algorithm times (in seconds) - 10 values (min=0.00478161 10%=0.00478214 median=0.00478301 90%=0.231003 max=0.231003) sum median effective algorithm bandwidth: 77.8858 GB/s ______________________________________________________ Evaluating algorithm #4/6: 02 atomicAdd but each workItem loads K values Kernels compilation done in 0.184404 seconds algorithm times (in seconds) - 10 values (min=0.00253565 10%=0.00253643 median=0.00253742 90%=0.187099 max=0.187099) sum median effective algorithm bandwidth: 146.814 GB/s ______________________________________________________ Evaluating algorithm #5/6: 03 local memory and atomicAdd from master thread Kernels compilation done in 0.22123 seconds algorithm times (in seconds) - 10 values (min=0.00903914 10%=0.00904023 median=0.00904172 90%=0.230382 max=0.230382) sum median effective algorithm bandwidth: 41.2011 GB/s ______________________________________________________ Evaluating algorithm #6/6: 04 local reduction Kernels compilation done in 0.350192 seconds algorithm times (in seconds) - 10 values (min=0.0138063 10%=0.0138096 median=0.030991 90%=0.381292 max=0.381292) sum median effective algorithm bandwidth: 12.0206 GB/s |
Collaborator
|
✅ Результаты тестирования PR #1050 Логи тестирования (нажмите чтобы развернуть)=== СТАТУС: Успешно выполнены программы: main_mandelbrot, main_sum === === main_mandelbrot stdout (exit code: -11 (segfault после выполнения)) === Found 1 GPUs in 0.294488 sec (CUDA: 0.121125 sec, OpenCL: 0.0377544 sec, Vulkan: 0.135549 sec) Available devices: Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb. Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb. Using OpenCL API... ______________________________________________________ Evaluating algorithm #1/3: CPU algorithm times (in seconds) - 1 values (min=3.27859 10%=3.27859 median=3.27859 90%=3.27859 max=3.27859) Mandelbrot effective algorithm GFlops: 3.05009 GFlops saving image to 'mandelbrot CPU.bmp'... CPU vs CPU average results difference: 0% ______________________________________________________ Evaluating algorithm #2/3: CPU with OpenMP OpenMP threads: x4 threads algorithm times (in seconds) - 10 values (min=0.990938 10%=1.00536 median=1.03266 90%=1.05578 max=1.05578) Mandelbrot effective algorithm GFlops: 9.68371 GFlops saving image to 'mandelbrot CPU with OpenMP.bmp'... CPU with OpenMP vs CPU average results difference: 0% ______________________________________________________ Evaluating algorithm #3/3: GPU Kernels compilation done in 0.0609532 seconds algorithm times (in seconds) - 10 values (min=0.00427339 10%=0.00427459 median=0.00427743 90%=0.0652848 max=0.0652848) Mandelbrot effective algorithm GFlops: 2337.85 GFlops saving image to 'mandelbrot GPU.bmp'... GPU vs CPU average results difference: 0.942446% === main_sum stdout (exit code: -11 (segfault после выполнения)) === Found 1 GPUs in 0.320605 sec (CUDA: 0.127663 sec, OpenCL: 0.037677 sec, Vulkan: 0.155209 sec) Available devices: Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb. Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb. Using OpenCL API... PCI-E upload times - 3 values (min=0.0411775 10%=0.0411775 median=0.0417159 90%=0.0420663 max=0.0420663) s PCI-E upload median bandwidth: 8.93015 GB/s ______________________________________________________ Evaluating algorithm #1/6: CPU algorithm times (in seconds) - 10 values (min=0.0352076 10%=0.0353851 median=0.0358452 90%=0.0362777 max=0.0362777) sum median effective algorithm bandwidth: 10.3927 GB/s ______________________________________________________ Evaluating algorithm #2/6: CPU with OpenMP algorithm times (in seconds) - 10 values (min=0.0159521 10%=0.0159914 median=0.0161745 90%=0.0165719 max=0.0165719) sum median effective algorithm bandwidth: 23.0319 GB/s ______________________________________________________ Evaluating algorithm #3/6: 01 atomicAdd from each workItem Kernels compilation done in 0.0793156 seconds algorithm times (in seconds) - 10 values (min=0.00275226 10%=0.0027528 median=0.00275443 90%=0.0821867 max=0.0821867) sum median effective algorithm bandwidth: 135.247 GB/s ______________________________________________________ Evaluating algorithm #4/6: 02 atomicAdd but each workItem loads K values Kernels compilation done in 0.0627053 seconds algorithm times (in seconds) - 10 values (min=0.00253486 10%=0.00253515 median=0.00253728 90%=0.0653479 max=0.0653479) sum median effective algorithm bandwidth: 146.822 GB/s ______________________________________________________ Evaluating algorithm #5/6: 03 local memory and atomicAdd from master thread Kernels compilation done in 0.0517637 seconds algorithm times (in seconds) - 10 values (min=0.00694707 10%=0.00694734 median=0.00694844 90%=0.058565 max=0.058565) sum median effective algorithm bandwidth: 53.6133 GB/s ______________________________________________________ Evaluating algorithm #6/6: 04 local reduction Kernels compilation done in 0.0447847 seconds algorithm times (in seconds) - 10 values (min=0.0149859 10%=0.0149881 median=0.0290293 90%=0.0738994 max=0.0738994) sum median effective algorithm bandwidth: 12.8329 GB/s |
Collaborator
|
✅ Результаты тестирования PR #1050 Логи тестирования (нажмите чтобы развернуть)=== СТАТУС: Успешно выполнены программы: main_mandelbrot, main_sum === === main_mandelbrot stdout (exit code: -11 (segfault после выполнения)) === Found 1 GPUs in 0.284734 sec (CUDA: 0.122128 sec, OpenCL: 0.0372252 sec, Vulkan: 0.125324 sec) Available devices: Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb. Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb. Using OpenCL API... ______________________________________________________ Evaluating algorithm #1/3: CPU algorithm times (in seconds) - 1 values (min=3.19382 10%=3.19382 median=3.19382 90%=3.19382 max=3.19382) Mandelbrot effective algorithm GFlops: 3.13105 GFlops saving image to 'mandelbrot CPU.bmp'... CPU vs CPU average results difference: 0% ______________________________________________________ Evaluating algorithm #2/3: CPU with OpenMP OpenMP threads: x4 threads algorithm times (in seconds) - 10 values (min=0.987109 10%=0.987283 median=0.997729 90%=1.01509 max=1.01509) Mandelbrot effective algorithm GFlops: 10.0228 GFlops saving image to 'mandelbrot CPU with OpenMP.bmp'... CPU with OpenMP vs CPU average results difference: 0% ______________________________________________________ Evaluating algorithm #3/3: GPU Kernels compilation done in 0.0578308 seconds algorithm times (in seconds) - 10 values (min=0.00427533 10%=0.00427645 median=0.00428027 90%=0.0621622 max=0.0621622) Mandelbrot effective algorithm GFlops: 2336.3 GFlops saving image to 'mandelbrot GPU.bmp'... GPU vs CPU average results difference: 0.942446% === main_sum stdout (exit code: -11 (segfault после выполнения)) === Found 1 GPUs in 0.323488 sec (CUDA: 0.127436 sec, OpenCL: 0.0402388 sec, Vulkan: 0.155747 sec) Available devices: Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb. Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb. Using OpenCL API... PCI-E upload times - 3 values (min=0.041737 10%=0.041737 median=0.0417655 90%=0.0418615 max=0.0418615) s PCI-E upload median bandwidth: 8.91954 GB/s ______________________________________________________ Evaluating algorithm #1/6: CPU algorithm times (in seconds) - 10 values (min=0.035438 10%=0.0354611 median=0.0357794 90%=0.0360533 max=0.0360533) sum median effective algorithm bandwidth: 10.4118 GB/s ______________________________________________________ Evaluating algorithm #2/6: CPU with OpenMP algorithm times (in seconds) - 10 values (min=0.0162572 10%=0.0162689 median=0.0164169 90%=0.0168448 max=0.0168448) sum median effective algorithm bandwidth: 22.6918 GB/s ______________________________________________________ Evaluating algorithm #3/6: 01 atomicAdd from each workItem Kernels compilation done in 0.0548277 seconds algorithm times (in seconds) - 10 values (min=0.00283585 10%=0.00283637 median=0.00283778 90%=0.0577678 max=0.0577678) sum median effective algorithm bandwidth: 131.275 GB/s ______________________________________________________ Evaluating algorithm #4/6: 02 atomicAdd but each workItem loads K values Kernels compilation done in 0.0445153 seconds algorithm times (in seconds) - 10 values (min=0.00253516 10%=0.00253564 median=0.00253636 90%=0.0471701 max=0.0471701) sum median effective algorithm bandwidth: 146.876 GB/s ______________________________________________________ Evaluating algorithm #5/6: 03 local memory and atomicAdd from master thread Kernels compilation done in 0.0537203 seconds algorithm times (in seconds) - 10 values (min=0.00736819 10%=0.00736856 median=0.0073707 90%=0.0611933 max=0.0611933) sum median effective algorithm bandwidth: 50.5419 GB/s ______________________________________________________ Evaluating algorithm #6/6: 04 local reduction Kernels compilation done in 0.0536568 seconds algorithm times (in seconds) - 10 values (min=0.0131542 10%=0.01318 median=0.0317062 90%=0.0840484 max=0.0840484) sum median effective algorithm bandwidth: 11.7494 GB/s |
Collaborator
|
✅ Результаты тестирования PR #1050 Логи тестирования (нажмите чтобы развернуть)=== СТАТУС: Успешно выполнены программы: main_mandelbrot, main_sum === === main_mandelbrot stdout (exit code: -11 (segfault после выполнения)) === Found 1 GPUs in 0.300292 sec (CUDA: 0.123019 sec, OpenCL: 0.0382075 sec, Vulkan: 0.139006 sec) Available devices: Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb. Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb. Using OpenCL API... ______________________________________________________ Evaluating algorithm #1/3: CPU algorithm times (in seconds) - 1 values (min=3.21191 10%=3.21191 median=3.21191 90%=3.21191 max=3.21191) Mandelbrot effective algorithm GFlops: 3.11342 GFlops saving image to 'mandelbrot CPU.bmp'... CPU vs CPU average results difference: 0% ______________________________________________________ Evaluating algorithm #2/3: CPU with OpenMP OpenMP threads: x4 threads algorithm times (in seconds) - 10 values (min=0.998009 10%=1.00105 median=1.02332 90%=1.05365 max=1.05365) Mandelbrot effective algorithm GFlops: 9.7721 GFlops saving image to 'mandelbrot CPU with OpenMP.bmp'... CPU with OpenMP vs CPU average results difference: 0% ______________________________________________________ Evaluating algorithm #3/3: GPU Kernels compilation done in 0.0533816 seconds algorithm times (in seconds) - 10 values (min=0.00427339 10%=0.00427684 median=0.00428078 90%=0.0577146 max=0.0577146) Mandelbrot effective algorithm GFlops: 2336.02 GFlops saving image to 'mandelbrot GPU.bmp'... GPU vs CPU average results difference: 0.942446% === main_sum stdout (exit code: -11 (segfault после выполнения)) === Found 1 GPUs in 0.296002 sec (CUDA: 0.1289 sec, OpenCL: 0.0374102 sec, Vulkan: 0.129638 sec) Available devices: Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb. Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb. Using OpenCL API... PCI-E upload times - 3 values (min=0.0412848 10%=0.0412848 median=0.0414234 90%=0.0417029 max=0.0417029) s PCI-E upload median bandwidth: 8.9932 GB/s ______________________________________________________ Evaluating algorithm #1/6: CPU algorithm times (in seconds) - 10 values (min=0.034578 10%=0.0346804 median=0.0350704 90%=0.0356046 max=0.0356046) sum median effective algorithm bandwidth: 10.6223 GB/s ______________________________________________________ Evaluating algorithm #2/6: CPU with OpenMP algorithm times (in seconds) - 10 values (min=0.0156988 10%=0.0157769 median=0.0162231 90%=0.0165231 max=0.0165231) sum median effective algorithm bandwidth: 22.9628 GB/s ______________________________________________________ Evaluating algorithm #3/6: 01 atomicAdd from each workItem Kernels compilation done in 0.0552275 seconds algorithm times (in seconds) - 10 values (min=0.00275223 10%=0.00275238 median=0.00275404 90%=0.0580801 max=0.0580801) sum median effective algorithm bandwidth: 135.266 GB/s ______________________________________________________ Evaluating algorithm #4/6: 02 atomicAdd but each workItem loads K values Kernels compilation done in 0.0409813 seconds algorithm times (in seconds) - 10 values (min=0.00146241 10%=0.0014625 median=0.001463 90%=0.0425418 max=0.0425418) sum median effective algorithm bandwidth: 254.634 GB/s ______________________________________________________ Evaluating algorithm #5/6: 03 local memory and atomicAdd from master thread Kernels compilation done in 0.0447164 seconds algorithm times (in seconds) - 10 values (min=0.00682616 10%=0.00703404 median=0.00708198 90%=0.0516351 max=0.0516351) sum median effective algorithm bandwidth: 52.6024 GB/s ______________________________________________________ Evaluating algorithm #6/6: 04 local reduction Kernels compilation done in 0.0538434 seconds algorithm times (in seconds) - 10 values (min=0.0142349 10%=0.0142363 median=0.0296506 90%=0.0835936 max=0.0835936) sum median effective algorithm bandwidth: 12.564 GB/s |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Локальный вывод
Вывод Github CI