A Single-Loop Robust Policy Gradient Method for Robust Markov Decision Processes

This repository contains files that describe the RMDPs experiments in paper "A Single-loop Robust Policy Gradient Method for Robust Markov Decision Processes".

All code are modified from repository.

There are two folders. One is for Garnet Problem and Inventory Problem.

Garnet Problem:
- C++ codes for generating Garnet problems with different size
- SRPG and DRPG implementation
- If you want to find another benchmark robust value iteration, please refer to repository. Since we consider gradient-based method in our paper.
Inventory
- Python code for generating Inventory problem
- Python codes for generating a inventory problem with parameterized transition and applying DRPG and SRPG to solve it
- Python codes for generating a inventory problem with parameterized transition and parameterized policy and applying DRPG and SRPG to solve it

Reference:

[1] Wang, Qiuhao, Chin Pang Ho, and Marek Petrik. "Policy gradient in robust MDPs with global convergence guarantee." In International Conference on Machine Learning, pp. 35763-35797. PMLR, 2023.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

A Single-Loop Robust Policy Gradient Method for Robust Markov Decision Processes

Files

README.md

Latest commit

History

README.md

File metadata and controls

A Single-Loop Robust Policy Gradient Method for Robust Markov Decision Processes