This is my simulation for the On-Policy and Off-Policy Algorithm described in H_{\infty} Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning paper.
The corresponding feedback Nash equilibrium
Using the On-Policy algorithm, I found the following control matrices
Using the Off-Policy algorithm, I found the following control matrices
The probing noise will not affect the system and the Nash equilibrium solution learned without deviation with Off-Policy Algorithm.
| Convergence of the optimal control matrix (On-Policy) | Convergence of the optimal control matrix (On-Policy) |
|---|---|
![]() |
![]() |
| Convergence of the optimal control matrix (On-Policy) | Convergence of the worst disturbance matrix (On-Policy) |
|---|---|
![]() |
![]() |
| Convergence of the optimal control matrix (Off-Policy) | Convergence of the optimal control matrix (Off-Policy) |
|---|---|
![]() |
![]() |
| Convergence of the optimal control matrix (Off-Policy) | Convergence of the worst disturbance matrix (Off-Policy) |
|---|---|
![]() |
![]() |
With my code, you can:
- On-Policy Algorithm by running
OnPolicySolution.py - Off-Policy Algorithm by running
OffPolicySolution.m - Off-Policy Algorithm Result Animation by running
Animation.m
I will provide DockerFile soon.
- Matlab
- python 3.11
- numpy
- matplotlib







