This is my simulation for the Algorithm described in Data-Driven Nonzero-Sum Game for Discrete-Time Systems Using Off-Policy Reinforcement Learning paper.
I used a different original control matrix than the article.
For the data collection phase, the probing noises are added to the behavior policy.
The corresponding feedback Nash equilibrium
Using the OffPolicy algorithm, I found the following control matrices
| Convergence of the optimal control matrix (Off-Policy) |
|---|
![]() |
| Convergence of the optimal control matrix (Off-Policy) |
|---|
![]() |
With my code, you can:
- Off-Policy Algorithm by running
OffPolicyRLforNZSG.m - Off-Policy Algorithm Result Animation by running
Animation.m
I will provide DockerFile soon.
- Matlab

