Skip to content

Auto tuner should tune for best cache configuration #49

@maddyscientist

Description

@maddyscientist

Currently we set the cache configuration to 48K L1 and 16 K shared (Fermi). However, this isn't optimal for all kernels and the auto tuner can actually switch the default cache configuration if it requests more than 16K per SM.

The solution is expand the TuneParam class to include a member variable enum cudaFuncCache, which will be tuned per kernel. This shouldn't be too much work, adding it to the 0.4.1 milestone.....

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions