Maybe define an array of fixed size and see at some point the compiler gives up on inlining.
Might have to look at nsys reports to see how much is being passed through.
Create new data structure like LocalGeometry but only contains two fields, to see if a simpler struct would actually reduce an important GPU usage metric.
Look in pointwise and broadcasts--anywhere projections are involved.
Look at UnrolledUtilities tests for inspiration for writing the test.
Test cases
- Pointwise (loop over different complexities and levels of nesting; inputs (tuples of N floats)) -- when do we fail to inline, or fail to compile altogether
- +, *, /, log (something harder than divide), project
- 1 vs 2 nonlocal args, yes/no is LocalGeom used?: interpolate, weighted interpolate, upwinding (e.g., van Leer), div or curl
Check runtime of every example, but also register pressure, etc.
Links
Maybe define an array of fixed size and see at some point the compiler gives up on inlining.
Might have to look at
nsysreports to see how much is being passed through.Create new data structure like LocalGeometry but only contains two fields, to see if a simpler struct would actually reduce an important GPU usage metric.
Look in pointwise and broadcasts--anywhere projections are involved.
Look at UnrolledUtilities tests for inspiration for writing the test.
Test cases
Check runtime of every example, but also register pressure, etc.
Links