cuQuantum benchmark and integration

# TEP - cuQuantum benchmark and integration

Author
@refraction-ray 

Status
Draft

Created
2025-02-04

## Abstract


This TEP proposes the benchmark and potential integration of NVIDIA cuQuantum libraries into TensorCircuit.  This integration will involve benchmarking TensorCircuit and cuQuantum performance, developing interfaces in TensorCircuit to leverage cuQuantum’s optimized functionalities.


## Motivation and Scope


TensorCircuit currently provides a versatile platform for quantum circuit simulation, leveraging tensor network techniques for efficient computation. One key question is whether TC provides similar performance compared to optimized cuQuantum package. A set of systematic and carefully designed benchmarks on GPU is necessary. Also note the cuQuantum may be very fragile to support AD/VMAP/JIT features in TC, which is a huge weakness for cuQuantum package.

This TEP aims to:

*   **Benchmark current TensorCircuit performance:**  Establish a baseline for performance comparison on relevant quantum circuit simulation tasks using existing TensorCircuit functionalities.
*   **Benchmark and integrate cuStateVec for state-vector simulations:**  Develop an interface within TensorCircuit to utilize cuStateVec for state-vector based circuit simulations. 
*   **Benchmark and integrate cuTensorNet for tensor network contraction:**  Explore and implement integration strategies for cuTensorNet to accelerate tensor network contraction within TensorCircuit. 
*   **Provide a user-friendly interface:**  Ensure that utilizing cuQuantum  is straightforward for TensorCircuit users.

## Usage and Impact



Users will be able to leverage cuQuantum acceleration in TensorCircuit by selecting a cuQuantum backend option when creating or running circuits.  The exact interface is subject to implementation details, but the goal is to make it as seamless as possible.

**Example Usage (Illustrative - API may change):**


```python
import tensorcircuit as tc
import numpy as np

# Create a circuit as usual
n_qubits = 20
c = tc.Circuit(n_qubits)
print(tc.cuquantum.expectation_ps(c, z=[1], modes=..., **kws))
```


## Backward compatibility



## Related Work



Some hints on the performance from Nvidia side: https://thequantuminsider.com/2023/12/22/nvidia-cuquantum-23-10-accelerating-quantum-computing-with-enhanced-sdk/

## Implementation



Before commencing integration, a comprehensive benchmarking phase is essential to establish a performance baseline and accurately measure the impact of cuQuantum. Only if the performance gain is promising, we start the integration phase, where a key focus will be on addressing potential compatibility issues with TensorCircuit's Automatic Differentiation (AD), Just-In-Time compilation (JIT), and Vectorized Map (VMAP) features. We acknowledge that cuQuantum's current capabilities may pose challenges for seamless AD/JIT/VMAP integration.

## Alternatives



## References

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cuQuantum benchmark and integration #11

TEP - cuQuantum benchmark and integration

Abstract

Motivation and Scope

Usage and Impact

Backward compatibility

Related Work

Implementation

Alternatives

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

cuQuantum benchmark and integration #11

Description

TEP - cuQuantum benchmark and integration

Abstract

Motivation and Scope

Usage and Impact

Backward compatibility

Related Work

Implementation

Alternatives

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions