Decision Stacks: Flexible Reinforcement Learning via Modular Generative Models

Authors: Siyan Zhao, Aditya Grover

Reinforcement learning provides a compelling approach for tackling various aspects of sequential decision making, such as defining complex goals, planning future actions and observations, and evaluating their utilities. However, effectively integrating these capabilities while maintaining both expressive power and flexibility in modeling choices poses significant algorithmic challenges for efficient learning and inference. In this work, we introduce Decision Stacks, a generative framework that decomposes goal-conditioned policy agents into three distinct generative modules. These modules utilize independent generative models to simulate the temporal evolution of observations, rewards, and actions, enabling parallel learning through teacher forcing. Our framework ensures both expressivity and flexibility by allowing designers to tailor individual modules to incorporate architectural bias, optimization objectives, dynamics, domain transferability, and inference speed. Through extensive empirical evaluations, we demonstrate the effectiveness of Decision Stacks in offline policy optimization across various Markov Decision Processes (MDPs) and Partially Observable Markov Decision Processes (POMDPs), outperforming existing methods and facilitating flexible generative decision making.

Code and Instructions

Environment Installation

To set up the environment required for running the project, follow the steps below:

Clone the repository: Dependencies are in env.yml. Install with:

conda env create -f env.yml
conda activate decisionstacks

Install DR4L

git clone https://github.com/Farama-Foundation/d4rl.git
cd d4rl
pip install -e .

Or, alternatively:

pip install git+https://github.com/Farama-Foundation/d4rl@master#egg=d4rl

Train and eval

First, change the path here train_decision_stacks_mdp.py#L139

Example training scripts are located in code/scripts/ For instance, you can run the following commands to train observation, reward, and action models independently:

Train a diffusion-based observation model:
```
bash ds_train_state.sh
```
Train a transformer-based action model:
```
bash ds_train_act.sh
```
Train an MLP-based reward model:
```
bash ds_train_rew.sh
```

Evaluate with

python eval_mdp.py

or

python eval_pomdp.py

Contact:

If you have any questions regarding this codebase, please reach out to siyanz@g.ucla.edu.

Acknowledgment:

This codebase is derived from decision diffuser

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
code		code
resources		resources
.gitignore		.gitignore
README.md		README.md
env.yml		env.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Decision Stacks: Flexible Reinforcement Learning via Modular Generative Models

Code and Instructions

Environment Installation

Train and eval

Contact:

Acknowledgment:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

siyan-zhao/decision-stacks

Folders and files

Latest commit

History

Repository files navigation

Decision Stacks: Flexible Reinforcement Learning via Modular Generative Models

Code and Instructions

Environment Installation

Train and eval

Contact:

Acknowledgment:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages