@OsmanMutlu when I try to train from scratch I do not seem to get convergence behavior described in README.md, can you try as well?