[cudaautotune] Implementing an autotuning interface and script #380

asubah · 2022-10-06T10:02:05Z

This PR introduces an autotuning interface and a python script to autotune the CUDA kernel launch configurations.

Refer to the README.md for instructions on how to use the script.

The interface is simple. It just reads the block size from a file in src/cudaautotune/autotuning/kernel_configs.

Tested on a node with Intel Xeon Silver 4214R and a single T4.
Baseline configuration: src/cudaautotune/autotuning/tunables-baseline.csv.
Average Throughput: 1,195.66 ± 1.96 events/s.

Best config found by autotuning: src/cudaautotune/autotuning/tunables.csv.
Average Throughput: 1,269.46 ± 3.52 events/s.

The search space is too big, hence, I selected a subset of the kernels that has the highest runtime and randomly autotuned them. Then fixed the configurations on the best found configuration, and tuned another set of kernels until all kernels are visited.

I appreciate your comments and suggestions regarding the interface or the script.

src/cuda/plugin-PixelTriplets/BrokenLineFitOnGPU.cu

makortel · 2022-10-16T10:31:17Z

Makefile

 export CUDA_CXXFLAGS := -I$(CUDA_BASE)/include
 export CUDA_TEST_CXXFLAGS := -DGPU_DEBUG
-export CUDA_LDFLAGS := -L$(CUDA_LIBDIR) -lcudart -lcudadevrt
+export CUDA_LDFLAGS := -L$(CUDA_LIBDIR) -lcudart -lcudadevrt -lcudaautotunert


Could this library be added only in src/cudaautotune/Makefile?

makortel · 2022-10-16T10:33:55Z

src/cudaautotune/CUDACore/ExecutionConfiguration.h

+namespace cms {
+  namespace cuda {
+
+    class ExecutionConfiguration {


Currently this class appears to be used only as a "namespace", because the objects have no state.

On the other hand, if I understood correctly, the launch parameters are read from the file for each kernel for each event. Would it be feasible to read each file only once in some way?

makortel · 2022-10-16T10:39:22Z

src/cudaautotune/CUDACore/ExecutionConfiguration.h

+          file >> blockSize;
+          file.close();
+        } else {
+            std::cout << "Error in opening file " + filename + "\n";


Does it make sense to continue the program if a file can not be opened? (currently the function returns an indeterminate value)

makortel · 2022-10-16T10:40:02Z

src/cudaautotune/CUDACore/HistoContainer.h

+#ifdef __CUDACC__
+      uint32_t *poff = (uint32_t *)((char *)(h) + offsetof(Histo, off));
+      int32_t *ppsws = (int32_t *)((char *)(h) + offsetof(Histo, psws));
+      cms::cuda::ExecutionConfiguration exec;


This doesn't seem to be used.

makortel · 2022-10-16T10:41:55Z

src/cudaautotune/Makefile

+TARGET_NAME := $(notdir $(TARGET_DIR))
+TARGET := $(BASE_DIR)/$(TARGET_NAME)
+include Makefile.deps
+EXTERNAL_DEPENDS := $(cuda_EXTERNAL_DEPENDS)


This should be

Suggested change

EXTERNAL_DEPENDS := $(cuda_EXTERNAL_DEPENDS)

EXTERNAL_DEPENDS := $(cudaautotune_EXTERNAL_DEPENDS)

makortel · 2022-10-16T10:42:13Z

src/cudaautotune/Makefile.deps

@@ -0,0 +1,12 @@
+cuda_EXTERNAL_DEPENDS := TBB CUDA EIGEN BOOST BACKTRACE


This should be

Suggested change

cuda_EXTERNAL_DEPENDS := TBB CUDA EIGEN BOOST BACKTRACE

cudaautotune_EXTERNAL_DEPENDS := TBB CUDA EIGEN BOOST BACKTRACE

makortel · 2022-10-16T10:44:12Z

src/cudaautotune/autotuning/tuner.py

@@ -0,0 +1,149 @@
+import argparse


How about adding

Suggested change

import argparse

#!/usr/bin/env python3

import argparse

so that the script could be run "directly" (./src/cudaautotune/autotuning/tuner.py)?

makortel · 2022-10-16T10:44:46Z

src/cudaautotune/autotuning/tuner.py

+parser.add_argument('-p', '--process', type=pathlib.Path, nargs=1, required=True,
+        help='path to the program to be autotuned')
+parser.add_argument('-c', '--configurations', type=pathlib.Path, nargs=1,
+        default=[pathlib.Path('src/cudaautotune/autotuning/kernel_configs')], help='path to save the configurations for the tunable process to read them. Default = autotuning/kernel_configs/')


Why the default value is in a list? (same for the following two arguments too)

makortel · 2022-10-16T10:54:23Z

src/cudaautotune/autotuning/tuner.py

+    status = ""
+
+    cpu_threads = config[tunables["cpu_threads"]]
+    gpu_streams = cpu_threads + config[tunables["gpu_streams"]]


It is not clear to me why the number of concurrent events (--numberOfStreams below) is set to "number of CPU threads" + something named "GPU streams". Could you clarify what is the intended behavior here wrt. CPU threads and concurrent events?

makortel reviewed Oct 6, 2022

View reviewed changes

src/cuda/plugin-PixelTriplets/BrokenLineFitOnGPU.cu Outdated Show resolved Hide resolved

asubah added 2 commits October 11, 2022 12:28

[cudaautotune] Forking cuda into cudaautotune

3398a17

[cudaautotune] Implement an autotuning interface for CUDA kernels

65a6a83

asubah force-pushed the autotuning branch from a9ae89e to 309d6fe Compare October 11, 2022 11:23

asubah changed the title ~~[cuda] Implementing an autotuning interface and script~~ [cudaautotune] Implementing an autotuning interface and script Oct 11, 2022

asubah added 2 commits October 11, 2022 13:35

[cudaautotune] Autotuning script for CUDA kernels

9792825

[cudaautotune] Update Makefile to load the correct libraries

1c720d5

asubah force-pushed the autotuning branch from 309d6fe to 1c720d5 Compare October 11, 2022 11:36

makortel reviewed Oct 16, 2022

View reviewed changes

asubah added 10 commits March 8, 2023 14:30

update Makefiles

c47dd96

update tuning script to use OpenTuner

b8d5366

new tuning interface

d5e995a

update Makefiles

72567ad

update plugin-PixelTriplets to use the new interface

d3c7d6e

update plugin-SiPixelClusterizer to use the new interface

a4c4fab

update plugin-SiPixelRecHits to use the new interface

13ede64

add new configs

a6a33de

chep2023

994e6bf

cleaning

0a8e205

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[cudaautotune] Implementing an autotuning interface and script #380

[cudaautotune] Implementing an autotuning interface and script #380

Uh oh!

asubah commented Oct 6, 2022 •

edited

Loading

Uh oh!

Uh oh!

makortel Oct 16, 2022

Uh oh!

makortel Oct 16, 2022

Uh oh!

makortel Oct 16, 2022

Uh oh!

makortel Oct 16, 2022

Uh oh!

makortel Oct 16, 2022

Uh oh!

makortel Oct 16, 2022

Uh oh!

makortel Oct 16, 2022

Uh oh!

makortel Oct 16, 2022

Uh oh!

makortel Oct 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	EXTERNAL_DEPENDS := $(cuda_EXTERNAL_DEPENDS)
	EXTERNAL_DEPENDS := $(cudaautotune_EXTERNAL_DEPENDS)

		@@ -0,0 +1,12 @@
		cuda_EXTERNAL_DEPENDS := TBB CUDA EIGEN BOOST BACKTRACE

	cuda_EXTERNAL_DEPENDS := TBB CUDA EIGEN BOOST BACKTRACE
	cudaautotune_EXTERNAL_DEPENDS := TBB CUDA EIGEN BOOST BACKTRACE

[cudaautotune] Implementing an autotuning interface and script #380

Are you sure you want to change the base?

[cudaautotune] Implementing an autotuning interface and script #380

Uh oh!

Conversation

asubah commented Oct 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

asubah commented Oct 6, 2022 •

edited

Loading