Description
Hi everyone,
It has been cool to see the recent flurry of contributions to this package, especially by @jeremiahpslewis. In a recent discussion, someone asked what would facilitate cooperation between the POMDPs.jl and JuliaRL communities. I was thinking about this a bit more and came to the conclusion:
Separating out the environment interface would be the most helpful change for expanding collaboration.
There are a few reasons for this:
- There are many different reasons for writing RL algorithms. I assign homeworks where students write RL algorithms ranging from tabular SARSA to DQN or policy gradient; someone else might want a single very-high-performance PPO to reliably deploy to a web service; another person might want a library of research-quality algorithms to compare to; another person might want a CleanRL-style set of implementations that maximize readability. These should not all be in the same package, but they should use the same environment interface.
- Since this environment interface will have many stakeholders, there must be a way for all of the stakeholders to monitor and weigh in on interface design decisions. Currently, any discussion about the environment interface will be also be mixed in with discussion about GPUs, hooks, etc.
- Let's say that I write a package that uses the environment interface in RLCore.jl, but I don't want to use the policy interface. If I say I use RLCore.jl, it is unclear if I am committing to using just the environment interface or also the policy interface, and if a user wants to write an environment, they will find the RL.jl documentation and could be very distracted by all of the information about experiments, agents, etc, which my package does not use.
- It would be easier to understand the environment interface if it and its documentation was separated from the RL.jl documentation. (though the current environment interface documentation has improved a lot already!)
- In the successful Python RL ecosystem, the environment interface in gym/gymnasium/pettingzoo is separated from the packages that implement learning agents.
If the environment is separated out (and is sufficiently flexible), I would probably convert some important packages like MCTS and POMCP to use it. Then, they could be much more compatible with RL.jl.
A final note: In principle, CommonRLInterface could be a candidate for a separated-out environment interface, but I do not think it can be successful unless RL.jl chooses to use it directly. To be clear, I would vigorously advocate for this, and I am happy to discuss why, but I recognize that this would be biased since I wrote most of that package.