Skip to content

Task06 Александр Исаенков#1048

Open
PlayingPeano wants to merge 1 commit intoGPGPUCourse:task06from
PlayingPeano:task06
Open

Task06 Александр Исаенков#1048
PlayingPeano wants to merge 1 commit intoGPGPUCourse:task06from
PlayingPeano:task06

Conversation

@PlayingPeano
Copy link

@PlayingPeano PlayingPeano commented Feb 27, 2026

Локальный вывод

$ ./main_merge_sort 1
Found 3 GPUs in 0.0845624 sec (OpenCL: 0.0448216 sec, Vulkan: 0.0395478 sec)
Available devices:
  Device #0: API: Vulkan. iGPU. AMD Radeon Vega 8 Graphics (RADV RAVEN). Free memory: 1786/2970 Mb.
  Device #1: API: OpenCL. CPU. cpu-haswell-AMD Ryzen 5 3500U with Radeon Vega Mobile Gfx. AuthenticAMD. Total memory: 5146 Mb.
  Device #2: API: Vulkan. CPU. llvmpipe (LLVM 21.1.6, 256 bits). Free memory: 6862/6862 Mb.
Using device #1: API: OpenCL. CPU. cpu-haswell-AMD Ryzen 5 3500U with Radeon Vega Mobile Gfx. AuthenticAMD. Total memory: 5146 Mb.
Using OpenCL API...
n=100000000 values in range [1; 2147483646]
sorting on CPU...
CPU std::sort finished in 35.2513 sec
CPU std::sort effective RAM bandwidth: 0.0211356 GB/s (2.83677 uint millions/s)
Kernels compilation done in 0.093194 seconds
GPU merge-sort times (in seconds) - 10 values (min=18.2102 10%=18.4008 median=20.9557 90%=21.5827 max=21.5827)
GPU merge-sort median effective VRAM bandwidth: 0.035554 GB/s (4.77198 uint millions/s)

@GPUcourseBOT
Copy link
Collaborator

Результаты тестирования PR #1048

Логи тестирования (нажмите чтобы развернуть)
=== СТАТУС: Успешно выполнены программы: main_merge_sort ===
=== main_merge_sort stdout (exit code: -11 (segfault после выполнения)) ===
Found 1 GPUs in 8.65016 sec (CUDA: 0.117925 sec, OpenCL: 1.56522 sec, Vulkan: 6.96696 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using OpenCL API...
n=100000000 values in range [1; 2147483646]
sorting on CPU...
CPU std::sort finished in 11.564 sec
CPU std::sort effective RAM bandwidth: 0.0644289 GB/s (8.6475 uint millions/s)
Kernels compilation done in 4.6272 seconds
GPU merge-sort times (in seconds) - 10 values (min=0.270342 10%=0.270975 median=0.271458 90%=4.99486 max=4.99486)
GPU merge-sort median effective VRAM bandwidth: 2.74466 GB/s (368.381 uint millions/s)

Посмотреть полные логи

@GPUcourseBOT
Copy link
Collaborator

Результаты тестирования PR #1048

Логи тестирования (нажмите чтобы развернуть)
=== СТАТУС: Успешно выполнены программы: main_merge_sort ===
=== main_merge_sort stdout (exit code: -11 (segfault после выполнения)) ===
Found 1 GPUs in 0.288912 sec (CUDA: 0.122584 sec, OpenCL: 0.0374053 sec, Vulkan: 0.128866 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using OpenCL API...
n=100000000 values in range [1; 2147483646]
sorting on CPU...
CPU std::sort finished in 11.3193 sec
CPU std::sort effective RAM bandwidth: 0.0658215 GB/s (8.83442 uint millions/s)
Kernels compilation done in 0.0603096 seconds
GPU merge-sort times (in seconds) - 10 values (min=0.270826 10%=0.271254 median=0.271912 90%=0.415427 max=0.415427)
GPU merge-sort median effective VRAM bandwidth: 2.74007 GB/s (367.766 uint millions/s)

Посмотреть полные логи

@GPUcourseBOT
Copy link
Collaborator

Результаты тестирования PR #1048

Логи тестирования (нажмите чтобы развернуть)
=== СТАТУС: Успешно выполнены программы: main_merge_sort ===
=== main_merge_sort stdout (exit code: -11 (segfault после выполнения)) ===
Found 1 GPUs in 0.315448 sec (CUDA: 0.120847 sec, OpenCL: 0.0374985 sec, Vulkan: 0.157041 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using OpenCL API...
n=100000000 values in range [1; 2147483646]
sorting on CPU...
CPU std::sort finished in 11.4936 sec
CPU std::sort effective RAM bandwidth: 0.0648236 GB/s (8.70048 uint millions/s)
Kernels compilation done in 0.0645857 seconds
GPU merge-sort times (in seconds) - 10 values (min=0.270352 10%=0.270777 median=0.272053 90%=0.436599 max=0.436599)
GPU merge-sort median effective VRAM bandwidth: 2.73865 GB/s (367.575 uint millions/s)

Посмотреть полные логи

@GPUcourseBOT
Copy link
Collaborator

Результаты тестирования PR #1048

Логи тестирования (нажмите чтобы развернуть)
=== СТАТУС: Успешно выполнены программы: main_merge_sort ===
=== main_merge_sort stdout (exit code: -11 (segfault после выполнения)) ===
Found 1 GPUs in 0.296224 sec (CUDA: 0.122169 sec, OpenCL: 0.0395973 sec, Vulkan: 0.134392 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using OpenCL API...
n=100000000 values in range [1; 2147483646]
sorting on CPU...
CPU std::sort finished in 11.5124 sec
CPU std::sort effective RAM bandwidth: 0.0647178 GB/s (8.68627 uint millions/s)
Kernels compilation done in 0.0577272 seconds
GPU merge-sort times (in seconds) - 10 values (min=0.270201 10%=0.270948 median=0.272898 90%=0.410731 max=0.410731)
GPU merge-sort median effective VRAM bandwidth: 2.73018 GB/s (366.438 uint millions/s)

Посмотреть полные логи

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants