Commit bb0ad77
authored
Add Replay Buffer for MAGRPO, MAAC, IAC, and use env_step to record (#50)
* add rolloutbuffer for magrpo to align with maac
* Update requirements.txt
* Use env-step logging for MAAC/MAGRPO
* allow verbose output
* update default
* Update magrpo.py
* Update magrpo.py
* Revert "Update magrpo.py"
This reverts commit e0ce522.
* Revert "Update magrpo.py"
This reverts commit c665345.
* Reapply "Update magrpo.py"
This reverts commit bdb822b.
* fix magrpo
* remove hard constraint of buffer size of maac
* try to align magrpo with maac by having top-k null and same prompt
* update magrpo log
* ud wandb's problem
* fix logging
* Revert "fix logging"
This reverts commit 2a1653b.
* Revert "ud wandb's problem"
This reverts commit 1a27c31.
* Revert "update magrpo log"
This reverts commit e99796b.
* get Iac setup with buffer and remove unused return norm in maac
* add eval to iac
* allow multi-turn iac
* update log of iac
* better align
* Update iac.py
* Update changelog.md1 parent 70d9662 commit bb0ad77
7 files changed
Lines changed: 672 additions & 276 deletions
File tree
- comlrl/trainers
- docs/content/docs
- dev
- user-guide
- examples
0 commit comments