Skip to content

This is the code for The Most Overestimated Q-value Regularization in High-dimensional Discrete Action Spaces for Offline Reinforcement Learning

Notifications You must be signed in to change notification settings

Hanyang-Robot/MQR

Repository files navigation

MQR

We propose the Most overestimated Q-value Regularization (MQR), a novel offline reinforcement learning algorithm that penalizes the action with the most overestimated Q-value, effectively mitigating overestimation in high-dimensional discrete action spaces. This repository provides mujoco and pytorch code for training and testing MQR.

The Most Overestimated Q-value Regularization in High-dimensional Discrete Action Spaces for Offline Reinforcement Learing

Paper: https://ieeexplore.ieee.org/abstract/document/11304592

  • The Overview of the proposed MQR framework Image

  • For example, robotic pushing and grasping tasks require precise decision-making within high-dimensional discrete action spaces. Image

1️⃣ Installation

This code has been tested with python 3.9, pytorch 1.12.0, NVIDIA DRIVER 470, CUDA 11.3, cuDNN 8.2.1 on Ubuntu 20.04.6 LTS.

Create the conda env and install necessary python libraries.

conda env create -f mqr.yaml

2️⃣ Collection Offline Dataset

Collect an offline dataset in a simulation environment.

python generate_dataset_sim.py
  • The offline dataset collected in the simulation is stored in the "MQR/logs/offline_dataset_sim/" directory.

  • The data folder contains state information (RGB-D heightmap), while the models folder stores the parameters of the online VPG model.

  • In addition, the transitions folder includes the action (executed_action.txt) and reward (reward_value.txt) data.

  • If you want to modify the settings for offline dataset collection, please refer to the "generate_dataset_sim.yaml" file in the conf directory.

  • The image below illustrates the process of collecting an offline dataset in the simulation environment.

    Image

Collect an offline dataset in a real environment.

python generate_dataset_real.py
  • The offline dataset collected in the real-world is stored in the "MQR/logs/offline_dataset_real/" directory.
  • If you want to modify the settings for offline dataset collection, please refer to the "generate_dataset_real.yaml" file in the conf directory.

3️⃣ Training offline RL policy

python train.py
  • If you want to modify the settings for training offline RL policy, please refer to the "train.yaml" file in the conf directory.
  • You must write the path to the offline dataset in the "log_directory" variable of the train.yaml file.

4️⃣ Testing offline RL policy

Testing policy on the simulation environment with random arrangements.

python test_sim_random.py
  • If you want to modify the settings for testing sim policy, please refer to the "test_sim_random.yaml" file in the conf directory.

  • You must write the path to the weight file for offline RL policy in the "model_weight" variable of the test_sim_random.yaml file.

  • Below is an example video for test_sim_random.

    Image

Testing policy on the simulation environment with dense arrangement.

python test_sim_dense.py
  • If you want to modify the settings for testing sim_dense policy, please refer to the "test_sim_dense.yaml" file in the conf directory.

  • You must write the path to the weight file for offline RL policy in the "model_weight" variable of the test_sim_dense.yaml file.

  • Below is an example video for test_sim_dense.

    Image

Testing policy on the simulation environment with unknown objects.

python test_sim_unknown.py
  • If you want to modify the settings for testing sim_dense policy, please refer to the "test_sim_unknown.yaml" file in the conf directory.

  • You must write the path to the weight file for offline RL policy in the "model_weight" variable of the test_sim_unknown.yaml file.

  • Below is an example video for test_sim_unknown.

    Image

Testing policy on the real environment.

python test_real.py
  • If you want to modify the settings for testing real policy, please refer to the "test_real.yaml" file in the conf directory.

  • You must write the path to the weight file for offline RL policy in the "model_weight" variable of the test_real.yaml file.

  • Below is an example video for test_real_dense.

    Image

  • Below is an example video for test_real_challenge.

    Image

  • Below is an example video for test_real_unknown.

    Image

Note if you use MQR Framework in your work, please cite the following paper:

@article{yu2025most,
  title={The Most Overestimated Q Value Regularization in High-Dimensional Discrete Action Spaces for Offline Reinforcement Learning},
  author={Yu, Seunghwan and Park, Homin and Ko, Byungjin and Shin, Jisub and Hong, Yoonki and Park, Taejoon and Yoon, Jong-Wan},
  journal={IEEE Transactions on Neural Networks and Learning Systems},
  year={2025},
  publisher={IEEE}
}

About

This is the code for The Most Overestimated Q-value Regularization in High-dimensional Discrete Action Spaces for Offline Reinforcement Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages