We propose the Most overestimated Q-value Regularization (MQR), a novel offline reinforcement learning algorithm that penalizes the action with the most overestimated Q-value, effectively mitigating overestimation in high-dimensional discrete action spaces. This repository provides mujoco and pytorch code for training and testing MQR.
The Most Overestimated Q-value Regularization in High-dimensional Discrete Action Spaces for Offline Reinforcement Learing
Paper: https://ieeexplore.ieee.org/abstract/document/11304592
-
For example, robotic pushing and grasping tasks require precise decision-making within high-dimensional discrete action spaces.

This code has been tested with python 3.9, pytorch 1.12.0, NVIDIA DRIVER 470, CUDA 11.3, cuDNN 8.2.1 on Ubuntu 20.04.6 LTS.
Create the conda env and install necessary python libraries.
conda env create -f mqr.yaml- You can download the offline dataset files (11.18 GB) from the Google Drive link: https://drive.google.com/file/d/1I0putLuqAi4DDrPeQHdzo8l-_EERKMWD/view?usp=sharing
python generate_dataset_sim.py-
The offline dataset collected in the simulation is stored in the "MQR/logs/offline_dataset_sim/" directory.
-
The data folder contains state information (RGB-D heightmap), while the models folder stores the parameters of the online VPG model.
-
In addition, the transitions folder includes the action (executed_action.txt) and reward (reward_value.txt) data.
-
If you want to modify the settings for offline dataset collection, please refer to the "generate_dataset_sim.yaml" file in the conf directory.
-
The image below illustrates the process of collecting an offline dataset in the simulation environment.
python generate_dataset_real.py- The offline dataset collected in the real-world is stored in the "MQR/logs/offline_dataset_real/" directory.
- If you want to modify the settings for offline dataset collection, please refer to the "generate_dataset_real.yaml" file in the conf directory.
python train.py- If you want to modify the settings for training offline RL policy, please refer to the "train.yaml" file in the conf directory.
- You must write the path to the offline dataset in the "log_directory" variable of the train.yaml file.
- You can download the model weight files (233.6 MB) from the Google Drive link:
https://drive.google.com/file/d/1_XPfSjmlC970fYzVtDx41ZBjn-56v2oJ/view?usp=sharing
python test_sim_random.py-
If you want to modify the settings for testing sim policy, please refer to the "test_sim_random.yaml" file in the conf directory.
-
You must write the path to the weight file for offline RL policy in the "model_weight" variable of the test_sim_random.yaml file.
-
Below is an example video for test_sim_random.
python test_sim_dense.py-
If you want to modify the settings for testing sim_dense policy, please refer to the "test_sim_dense.yaml" file in the conf directory.
-
You must write the path to the weight file for offline RL policy in the "model_weight" variable of the test_sim_dense.yaml file.
-
Below is an example video for test_sim_dense.
python test_sim_unknown.py-
If you want to modify the settings for testing sim_dense policy, please refer to the "test_sim_unknown.yaml" file in the conf directory.
-
You must write the path to the weight file for offline RL policy in the "model_weight" variable of the test_sim_unknown.yaml file.
-
Below is an example video for test_sim_unknown.
python test_real.py-
If you want to modify the settings for testing real policy, please refer to the "test_real.yaml" file in the conf directory.
-
You must write the path to the weight file for offline RL policy in the "model_weight" variable of the test_real.yaml file.
-
Below is an example video for test_real_dense.
-
Below is an example video for test_real_challenge.
-
Below is an example video for test_real_unknown.
Note if you use MQR Framework in your work, please cite the following paper:
@article{yu2025most,
title={The Most Overestimated Q Value Regularization in High-Dimensional Discrete Action Spaces for Offline Reinforcement Learning},
author={Yu, Seunghwan and Park, Homin and Ko, Byungjin and Shin, Jisub and Hong, Yoonki and Park, Taejoon and Yoon, Jong-Wan},
journal={IEEE Transactions on Neural Networks and Learning Systems},
year={2025},
publisher={IEEE}
}





