-
Notifications
You must be signed in to change notification settings - Fork 51
Open
Description
I tried to run probing tasks for different Atari environments, using the following command:
python -m scripts.run_probe --method infonce-stdim --env-name {env_name}
I did not change any code, just tried different game, including PongNoFrameskip-v4, BowlingNoFrameskip-v4, BreakoutNoFrameskip-v4, HeroNoFrameskip-v4.
However, only the F1 score for pong matches the score reported in the paper. The F1 scores of the other three games are far worse than the score shown in the paper (for bowling, I got 0.22).
I check the training loss logged in wandb, it seems that training has not converged at all. See the figure below.
How to get the F1 socres reported in the paper? Am I missing something?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
