Conversation
|
✅ Результаты тестирования PR #1042 Логи тестирования (нажмите чтобы развернуть)=== СТАТУС: Успешно выполнены программы: main_aplusb_matrix === === main_aplusb_matrix stdout (exit code: -11 (segfault после выполнения)) === Found 1 GPUs in 8.57138 sec (CUDA: 0.114671 sec, OpenCL: 0.707158 sec, Vulkan: 7.74949 sec) Available devices: Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb. Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb. Using OpenCL API... matrices size: 16384x8192 = 3 * 512 MB Running BAD matrix kernel... Kernels compilation done in 3.48946 seconds a + b matrix kernel times (in seconds) - 10 values (min=0.006532 10%=0.006533 median=0.006536 90%=3.49609 max=3.49609) a + b kernel median VRAM bandwidth: 229.498 GB/s Running GOOD matrix kernel... Kernels compilation done in 0.072696 seconds a + b matrix kernel times (in seconds) - 10 values (min=0.010631 10%=0.010633 median=0.010639 90%=0.083395 max=0.083395) a + b kernel median VRAM bandwidth: 140.991 GB/s |
Что-то не то, пока дебажу... |
|
✅ Результаты тестирования PR #1042 Логи тестирования (нажмите чтобы развернуть)=== СТАТУС: Успешно выполнены программы: main_aplusb_matrix === === main_aplusb_matrix stdout (exit code: -11 (segfault после выполнения)) === Found 1 GPUs in 8.55892 sec (CUDA: 0.113286 sec, OpenCL: 0.717539 sec, Vulkan: 7.72802 sec) Available devices: Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb. Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb. Using OpenCL API... matrices size: 16384x8192 = 3 * 512 MB Running BAD matrix kernel... Kernels compilation done in 3.49659 seconds a + b matrix kernel times (in seconds) - 10 values (min=0.031919 10%=0.031956 median=0.032117 90%=3.55054 max=3.55054) a + b kernel median VRAM bandwidth: 46.7042 GB/s Running GOOD matrix kernel... Kernels compilation done in 0.069782 seconds a + b matrix kernel times (in seconds) - 10 values (min=0.006273 10%=0.006279 median=0.006281 90%=0.076121 max=0.076121) a + b kernel median VRAM bandwidth: 238.815 GB/s |
Локальный вывод
Вывод Github CI