This project explores using Deep Reinforcement Learning (PPO) to control quantum systems. We simulate the quantum physics using QuTiP and wrap it in a Gymnasium environment.
The core research question is: Can an AI agent learn to prepare complex entangled states where analytical solutions are difficult?
project/
├── main.py # Unified Entry Point
├── src/ # Source code
│ ├── simulation.py # Physics (N-Qubit Lindblad Master Equation)
│ ├── environment.py # Gym Wrapper (Curriculum, Dynamic Actions)
│ ├── train.py # Training Logic (Curriculum Learning)
│ ├── benchmark.py # Visualization & Robustness Check
├── data/ # Artifacts
│ ├── models/ # Trained PPO agents
│ ├── plots/ # Performance plots
The agent controls a system of
For
The agent outputs a 4D action vector at every time step:
- Amplitude 1 (
$\Omega_1$ ) - Phase 1 (
$\phi_1$ ) - Amplitude 2 (
$\Omega_2$ ) - Phase 2 (
$\phi_2$ )
We also model Crosstalk (Leakage) and T1 Relaxation Noise.
pip install -r requirements.txt(Requires qutip, gymnasium, stable-baselines3, numpy, matplotlib, tensorboard)
The project uses a Unified Pipeline.
A single command runs the entire sequence: Train
Task: Prepare Superposition
python main.py --n_qubits 1Task: Prepare Entangled Bell State
python main.py --n_qubits 2 --steps 200000Note: N=2 uses Curriculum Learning. It first learns to flip qubits ($|11\rangle$), then learns to entangle them ($|\Phi^+\rangle$). We recommend 200k+ steps for convergence.
The pipeline automatically generates plots in data/plots/:
n1_robustness.png: AI vs Analytical Pulse under T1 noise.n2_robustness.png: AI Entanglement Fidelity vs Noise.n2_pulse.png: The pulse shape discovered by the AI.
- Robustness: The RL agent learns "robust" pulses that are resilient to noise, maintaining moderate fidelity even in high-decoherence regimes.
-
Curriculum: Layering the learning process (Pulse Control
$\rightarrow$ Interaction Timing) is crucial for converging on the Bell State.