Implement adjoint operators

Albeit slower, they are mathematically "nicer", and many people prefer to have fully matched/adjoint operators. 

This basically just needs to reproduce the same  kernels for forward projection, but to read from projection and store in image. 

Kernel changes are minimal, but requires quite more changes in the memory management side of the CUDA code.