CPU is most efficient for subsystems with very low numbers of qubits; GPU is most efficient for around 20 qubits and up; the Intel HD seems faster for the large-scale limit than GPU simulation, but it's slower than a discrete GPU in practice; the crossover point for efficiency of the Intel HD might be lower than 20 qubits. (Tensor product operations via Compose() require allocation of a totally new state vector, and the Compose() implementation has been optimized for cross-device cases, among other cases.)
Is there a way we can holistically use the Intel HD for middle-range subsystem sizes, at practical advantage, like in QHybrid?
CPU is most efficient for subsystems with very low numbers of qubits; GPU is most efficient for around 20 qubits and up; the Intel HD seems faster for the large-scale limit than GPU simulation, but it's slower than a discrete GPU in practice; the crossover point for efficiency of the Intel HD might be lower than 20 qubits. (Tensor product operations via
Compose()require allocation of a totally new state vector, and theCompose()implementation has been optimized for cross-device cases, among other cases.)Is there a way we can holistically use the Intel HD for middle-range subsystem sizes, at practical advantage, like in
QHybrid?