Skip to content

Comments

Fixup for 'Training An Agent' page#1281

Merged
pseudo-rnd-thoughts merged 3 commits intoFarama-Foundation:mainfrom
chr0nikler:train-agent-fix
Jan 6, 2025
Merged

Fixup for 'Training An Agent' page#1281
pseudo-rnd-thoughts merged 3 commits intoFarama-Foundation:mainfrom
chr0nikler:train-agent-fix

Conversation

@chr0nikler
Copy link
Contributor

@chr0nikler chr0nikler commented Dec 23, 2024

Description

"Training An Agent" code does not match visual outputs. Short term solution is to change visualization code to match outputs. Long-term proposal to change page to be specifically about agent training and let users go to tutorial for black jack, or do away with page entirely. If keeping page, could turn into a 2-arm/n-arm e-greedy bandit.

Fixes # (issue)

Type of change

Please delete options that are not relevant.

  • Documentation only change (no code changed)

Screenshots

Before After
Screenshot 2024-12-23 at 4 23 44 PM Screenshot 2024-12-23 at 4 23 29 PM

Checklist:

  • I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, sorry for the late review.
Quick question, why is two of them moving average using "valid" and the final one using "same"?

@chr0nikler
Copy link
Contributor Author

chr0nikler commented Jan 5, 2025

Yeah, that middle graph is affected by the start and end windows if they aren't full. Centering (by "same" which is what the python code in the BlackJack Tutorial does) produces this

Figure_1

For reference: "valid" removes boundary effects. "same" produces the output length equal to max of the bigger of the two arrays.

Copy link
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think that makes sense

@pseudo-rnd-thoughts pseudo-rnd-thoughts merged commit fc74bb8 into Farama-Foundation:main Jan 6, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants