We collected, separately, the set of events, metrics, and execution times of different GPU applications, using different machines with a Linux operating system. We collected all data using the CUDA profiling tool nvprof. Data is about 20 CUDA kernels, 11 of them belonging to 6 real-world applications of the Rodinia benchmark suite, and other 9 classical vector/matrix applications commonly used for benchmarking. They were executed over 9 NVIDIA GPUs in different machines.
nvprof has four modes to collect information from command-line, these are: summary, GPU-trace/API-trace, event/metric summary and event/metric trace. We have used GPU-trace and event/metric trace modes. The first mode collected data about execution times. The second mode is very expensive computationally. The event/metric trace mode was executed only one time over each GPU, because of the process for one execution spent more than one week and the variance of the metric and event values was negligible.
In event/metric trace mode, all events and metrics are collected for each kernel execution. Although this causes a large overhead in the execution of the kernels, it gives detailed information about the behavior and performance of the executed CUDA kernel functions. All applications were iterated over their selected parameters. The number of events and metrics varied according to the compute capability of the GPUs. All collected information resulted in an approximated size of 12.5GB and the process spent up to 15 days in each GPU. For each sample, the metrics, events, and traces information were collected in different phases.
Full Changelog: https://github.com/marcosamaris/gpuperfpredict/commits/JPDC_2022