Auto tuner should tune for best cache configuration

Currently we set the cache configuration to 48K L1 and 16 K shared (Fermi).  However, this isn't optimal for all kernels and the auto tuner can actually switch the default cache configuration if it requests more than 16K per SM.

The solution is expand the TuneParam class to include a member variable enum cudaFuncCache, which will be tuned per kernel.  This shouldn't be too much work, adding it to the 0.4.1 milestone.....


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto tuner should tune for best cache configuration #49

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Auto tuner should tune for best cache configuration #49

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions