
In this particular function where you are returning the symmetrical board states, the line where you are updating the list l, you
add (newB, list(newPi.ravel()) + [pi[-1]]).
Why do you add the only policy vector to the new rotated/mirrored policy vector?
In this particular function where you are returning the symmetrical board states, the line where you are updating the list l, you
add (newB, list(newPi.ravel()) + [pi[-1]]).
Why do you add the only policy vector to the new rotated/mirrored policy vector?