-
Notifications
You must be signed in to change notification settings - Fork 43
Description
The global work size (GWS) parameter in OpenCL is used to tell a device how many pieces of work to do at a time. Tuning this parameter can result in big improvements in throughput (sometimes over 50%).
Currently, the optimal GWS for each GPU model is determined through manual experimentation and put into gws.c. This method does not scale well, as it leaves out many popular hardware models. A much better method is to add an auto-tuner that determines the optimal setting at run-time.
A proposed solution is this: each time the generation or lookup code is run, it will check if an optimal setting is already known from a previous invokation. This will be done with the following values as a unique key in a hash table: table parameters, device name, driver version (note that the table parameters have been noted to make a difference in optimal GWS; furthermore, driver improvements can make a difference as well). If an optimal setting is already known, it is used; otherwise, variations of the GWS will be tested until an optimal value is found.
The manual GWS command line argument ("-gws") must be preserved in case the user wishes to override this setting.