Recurrent PPO to solve Doom purely in CPP (Yuck). This is self-learning project. I feel like it will be messy but it is what it is I guess!. The sources I followed:
- Cleanrl : https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari_lstm.py#L310
- StableBaselines3 : https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/ppo/ppo.py
- and SKRL for guidance and pseudo-code : https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
- My dear colleague's old medium post : https://medium.com/@mihai.anca13/exploring-opengl-physx-and-pytorch-all-in-c-8d8308a89cc6
sudo apt-get install clang clang++ gcc g++ cmake boost sdl2 openal-softor
brew install clang clang++ gcc g++ cmake boost sdl2 openal-softDownload and unzip in this project.
LINK FOR MACOS: https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.9.1.zip
All the libtorch dists : https://download.pytorch.org/libtorch/cpu/?utm_source=chatgpt.com
ViZDoom(Good luck with dependencies on MACOS)
git clone https://github.com/Farama-Foundation/ViZDoom.git
cd ViZDoom
mkdir build && cd build
cmake .. \
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_ENGINE=ON \
-DBUILD_PYTHON=OFF \
-DCMAKE_OSX_ARCHITECTURES=arm64
make -j$(sysctl -n hw.ncpu)
cd ../..DO NOT FORGET TO RUN THIS FROM THE MAIN FOLDER OF THE PROJECT.
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release .. && cmake --build . -vMaybe you should modify the src/main.cpp if you want to change the environment. Search for train_ppo_run/train_ppo_rnn_run function. Similar to StableBaselines.
./recurrent_ppo_cpp --operation train
or
./recurrent_ppo_cpp --operation train_rnn
It takes around 400-2000 episodes depending on the scenario with vanilla PPO (PPO-RNN 20-50 updates). Start with simple.wad. It is default.
IMPORTANT : If you changed something in train_ppo_run/train_ppo_rnn_run function related to the agent or environment, do the same changes in play_doom_ai_run/play_doom_ai_rnn_run otherwise you will use apples to squeeze orange juice and you will be very very sad.
./recurrent_ppo_cpp --operation play_ai
or
./recurrent_ppo_cpp --operation play_ai_rnn
.vscode/c_cpp_properties.json
{
"configurations": [
{
"name": "Mac",
"includePath": [
"${workspaceFolder}/**",
"/Library/Developer/CommandLineTools/usr/include/c++/v1", // C++ standard headers
"/usr/local/include",
"/usr/include",
"CUSTOM_PATH_TO_PARENT/RecurrentPPO-Cpp/libtorch/include"
],
"compilerPath": "/usr/bin/clang++", // Use Clang
"cStandard": "c17",
"cppStandard": "c++17",
"intelliSenseMode": "macos-clang-arm64"
}
],
"version": 4
}
- Start with running it. If you can train a vanilla PPO on basic.wad and run inference without any issues then you are ready for the next step.
- Check envs.hpp. I am not proud of that code but it is very similar to ViZDoom's C++ examples. Switch to another environment for example
deadly_corridorordefend_the_centertry to train Recurrent PPO this time. - Experimentation is the key, change hyper-params and environment rewards. Start with high level engineering instead of playing with algorithm. Add some extra metrics to the environment since it interacts with the agents directly.
- Let's say you can train decent agents, now start playing with neural networks. Add or remove layers. Switch to GRU. Try continuous actions or sigmoid outputs. Don't forget, RecurrentPPO doesn't support separate networks so
a2c.hppis your friend. If you want to challenge yourself later, go ahead implement the separate_network version. I was just lazy. If you can update the network then you are already knowing your shit! - Go ahead and checl
ppo.hppandppo_rnnand play around. Good practice that I couldn't do is implementing theKL_Adaptive_Schedulerorentropy_annealingwhich is constant and a little annoying at the moment. - If you are still good and feeling resilient to stay in C++ realm, then you can do whatever you want. Fork the repo or make a PR with your changes. I have no tests. I will probably test it by trying your updates. Ahh if you want to add tests to the repo, you are always welcome.
You should install only arm64 version of everything for macos. Also deal with the cmake. I had to use CMake because libtorch and vizdoom is crying for cmake. If I could avoid, I would but it would just make the process 10x longer. I am not a cmake master and I despise it to be very frank. Maybe one day I will port everything to NOB and be happy.
[ ] Model-based training
[ ] Prioritized replay on segments instead of throwing everything. It is a waste just fyi.
[ ] Robust Policy Optimization
[ ] KL-Adaptive Learning rate and entropy scheduling
[ ] Multiple Environments
[ ] Multi-source inputs and structured outputs
[ ] Action Masking
[ ] Adaptive Reward Normalization.
[ ] Using pre-trained vision encoders.
[ ] Visual graphs and metrics instead of prints.

