Description
In #32 we have the single-target approach running faster than multi-target approach - we pre-factorize the source
array and run all solves and all rsp functions target-by-target and accumulating the result.
This is possible by allocating essentially everything up front so each target is using the same workspaces, avoiding
having thousands/millions of small allocations killing the scaling.
Not materializing huge matrices means memory use is far smaller than previously, and probably that cache locality is
better too.
But, for single targets we don't even need half of the arrays or array operations any more, we can just use scalars. This would
really clean up the code.
So, the question is for the future of ConScape do we switch all algorithms to use a single target approach?
LinearSolve.jl kind of works like this already so things like #31 would be easy, although I'm not sure how the other (non ldiv!
) operations will
work on GPU as they will be far smaller and likely inefficient to launch as GPU tasks.