MQR

We propose the Most overestimated Q-value Regularization (MQR), a novel offline reinforcement learning algorithm that penalizes the action with the most overestimated Q-value, effectively mitigating overestimation in high-dimensional discrete action spaces. This repository provides mujoco and pytorch code for training and testing MQR.

The Most Overestimated Q-value Regularization in High-dimensional Discrete Action Spaces for Offline Reinforcement Learing

Paper: https://ieeexplore.ieee.org/abstract/document/11304592

The Overview of the proposed MQR framework
For example, robotic pushing and grasping tasks require precise decision-making within high-dimensional discrete action spaces.

1️⃣ Installation

This code has been tested with python 3.9, pytorch 1.12.0, NVIDIA DRIVER 470, CUDA 11.3, cuDNN 8.2.1 on Ubuntu 20.04.6 LTS.

Create the conda env and install necessary python libraries.

conda env create -f mqr.yaml

2️⃣ Collection Offline Dataset

You can download the offline dataset files (11.18 GB) from the Google Drive link: https://drive.google.com/file/d/1I0putLuqAi4DDrPeQHdzo8l-_EERKMWD/view?usp=sharing

Collect an offline dataset in a simulation environment.

python generate_dataset_sim.py

The offline dataset collected in the simulation is stored in the "MQR/logs/offline_dataset_sim/" directory.
The data folder contains state information (RGB-D heightmap), while the models folder stores the parameters of the online VPG model.
In addition, the transitions folder includes the action (executed_action.txt) and reward (reward_value.txt) data.
If you want to modify the settings for offline dataset collection, please refer to the "generate_dataset_sim.yaml" file in the conf directory.
The image below illustrates the process of collecting an offline dataset in the simulation environment.

Collect an offline dataset in a real environment.

python generate_dataset_real.py

The offline dataset collected in the real-world is stored in the "MQR/logs/offline_dataset_real/" directory.
If you want to modify the settings for offline dataset collection, please refer to the "generate_dataset_real.yaml" file in the conf directory.

3️⃣ Training offline RL policy

python train.py

If you want to modify the settings for training offline RL policy, please refer to the "train.yaml" file in the conf directory.
You must write the path to the offline dataset in the "log_directory" variable of the train.yaml file.

4️⃣ Testing offline RL policy

You can download the model weight files (233.6 MB) from the Google Drive link:
https://drive.google.com/file/d/1_XPfSjmlC970fYzVtDx41ZBjn-56v2oJ/view?usp=sharing

Testing policy on the simulation environment with random arrangements.

python test_sim_random.py

If you want to modify the settings for testing sim policy, please refer to the "test_sim_random.yaml" file in the conf directory.
You must write the path to the weight file for offline RL policy in the "model_weight" variable of the test_sim_random.yaml file.
Below is an example video for test_sim_random.

Testing policy on the simulation environment with dense arrangement.

python test_sim_dense.py

If you want to modify the settings for testing sim_dense policy, please refer to the "test_sim_dense.yaml" file in the conf directory.
You must write the path to the weight file for offline RL policy in the "model_weight" variable of the test_sim_dense.yaml file.
Below is an example video for test_sim_dense.

Testing policy on the simulation environment with unknown objects.

python test_sim_unknown.py

If you want to modify the settings for testing sim_dense policy, please refer to the "test_sim_unknown.yaml" file in the conf directory.
You must write the path to the weight file for offline RL policy in the "model_weight" variable of the test_sim_unknown.yaml file.
Below is an example video for test_sim_unknown.

Testing policy on the real environment.

python test_real.py

If you want to modify the settings for testing real policy, please refer to the "test_real.yaml" file in the conf directory.
You must write the path to the weight file for offline RL policy in the "model_weight" variable of the test_real.yaml file.
Below is an example video for test_real_dense.
Below is an example video for test_real_challenge.
Below is an example video for test_real_unknown.

Note if you use MQR Framework in your work, please cite the following paper:

@article{yu2025most,
  title={The Most Overestimated Q Value Regularization in High-Dimensional Discrete Action Spaces for Offline Reinforcement Learning},
  author={Yu, Seunghwan and Park, Homin and Ko, Byungjin and Shin, Jisub and Hong, Yoonki and Park, Taejoon and Yoon, Jong-Wan},
  journal={IEEE Transactions on Neural Networks and Learning Systems},
  year={2025},
  publisher={IEEE}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MQR

The Most Overestimated Q-value Regularization in High-dimensional Discrete Action Spaces for Offline Reinforcement Learing

1️⃣ Installation

2️⃣ Collection Offline Dataset

Collect an offline dataset in a simulation environment.

Collect an offline dataset in a real environment.

3️⃣ Training offline RL policy

4️⃣ Testing offline RL policy

Testing policy on the simulation environment with random arrangements.

Testing policy on the simulation environment with dense arrangement.

Testing policy on the simulation environment with unknown objects.

Testing policy on the real environment.

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
agent		agent
conf		conf
env		env
README.md		README.md
generate_dataset_real.py		generate_dataset_real.py
generate_dataset_sim.py		generate_dataset_sim.py
mqr.yaml		mqr.yaml
test_real.py		test_real.py
test_sim_dense.py		test_sim_dense.py
test_sim_random.py		test_sim_random.py
test_sim_unknown.py		test_sim_unknown.py
train.py		train.py
utils.py		utils.py

Hanyang-Robot/MQR

Folders and files

Latest commit

History

Repository files navigation

MQR

The Most Overestimated Q-value Regularization in High-dimensional Discrete Action Spaces for Offline Reinforcement Learing

1️⃣ Installation

2️⃣ Collection Offline Dataset

Collect an offline dataset in a simulation environment.

Collect an offline dataset in a real environment.

3️⃣ Training offline RL policy

4️⃣ Testing offline RL policy

Testing policy on the simulation environment with random arrangements.

Testing policy on the simulation environment with dense arrangement.

Testing policy on the simulation environment with unknown objects.

Testing policy on the real environment.

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages