Open
Description
Question
Hi, please state clearly in the documentation and dataset definition if in a time step "r_0" is consequence of "a_0"
With previous Offline RL libs, there has been some confusion with this respect.
With the standar in RL being (s,a,r,s') one assume that r is a consequence of applying action a in state s.
If r is not, please state it clearly, because then, the r(s,a) should be r_1 and not r_0
Thanks !
Metadata
Metadata
Assignees
Labels
No labels