This feature is for sure useless for non cuda backends. For cuda the performance benefit can come from:
- saving on the overall strides size while passing the data to the kernel;
- calculating offsets for several fields in one go.
The later was checked once on dycore — the result is that it doesn’t affect performance.
Getting rid from this feature support allows to simplify SID concept definition and the sid::composite implementation (which is the most complex metaprogramming part of our data base).