Hi authors,
Thank you for the nice work. I noticed that the PPO equation in Section 3.2.2 may have an issue. The ratio $r$ in PPO should be $\frac{\pi_{\theta}}{\pi_{\text{old}}}$, but your definition in that paragraph is $\frac{\pi_{\theta}}{\pi_{\text{ref}}}$. If I am correct, I think this would cause confusion to readers.