Open
Description
With early stages of CUDA support starting to make its way into qsim we should investigate registering GPU host devices for our OP kernels so we can give our users GPU accelerated simulation when it makes sense to do so.
Like with our existing ops, we may need to have two different parallelization schemes, one for small circuits where we parallelize over circuits in a batch and one for larger circuits we parallelize across single large wavefunctions. Once the connections have been made we should do an in depth study to determine which scenarios give best performance. This is a large project and will roughly require the following:
- Upgrade our qsim dependency to the latest version containing CUDA support (once available)
- Add a kokoro CI job that can build on GPUs, since github actions won't let us test / compile on real GPU hardware.
- For each of our core ops (expectation, sampled_exctation, sample and state) carry out performance benchmarks under the two parallelization schemes for GPU and provide new OP hardware targets that give the users the best possible performance
- Carry out these same tests for the
math_ops
. - Carry out these tests again for the
noise
ops.