offline-rl-sequence-modeling/README.txt at main · Tajaddin/offline-rl-sequence-modeling · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152

# README

------------------------------------------------------------

## 1. Project Overview

Project Title: Sequence Modeling for Offline Reinforcement Learning

Model Type:
Transformer (Causal GPT-style)

Objective:
Offline Reinforcement Learning via Sequence Modeling — framing RL as a
conditional sequence prediction problem using transformer architectures.

Models Implemented:
  1. Behavior Cloning (BC) — baseline transformer that imitates dataset actions
  2. Decision Transformer (DT) — return-conditioned sequence model
  3. Online Decision Transformer (ODT) — stochastic DT with online fine-tuning

Dataset Used:
D4RL hopper-medium-replay-v2
http://rail.eecs.berkeley.edu/datasets/offline_rl/gym_mujoco_v2/hopper_medium_replay-v2.hdf5

Expected test evaluation for sanity check: D4RL Normalized Score ~= 60-80 (DT)

------------------------------------------------------------

## 2. Repository Structure

```
Main/
  train.py                  # Training script (all 3 models)
  test.py                   # Evaluation script
  dataset_setup.py          # Downloads D4RL dataset
  requirements.txt          # Python dependencies
  README.txt                # This file
  src/
    __init__.py
    model.py                # BC, DT, ODT model architectures
    dataloader.py           # Dataset loading and PyTorch Datasets
    utils.py                # Evaluation and utility functions
  models/                   # Model checkpoints go here
  data/                     # Downloaded dataset goes here
  outputs/                  # Training outputs (losses, plots, checkpoints)
```

------------------------------------------------------------

## 3. Dataset (OPTION A — PUBLIC DATASET SPLITS)

Dataset Link:
http://rail.eecs.berkeley.edu/datasets/offline_rl/gym_mujoco_v2/hopper_medium_replay-v2.hdf5

Where to place the downloaded dataset:
```
data/
  hopper-medium-replay-v2.hdf5
```

Download command:
```
python dataset_setup.py --output data/
```

------------------------------------------------------------

## 4. Model Checkpoint

Where to place the checkpoint after downloading:
```
models/
  best_model_bc.pth
  best_model_dt.pth
  best_model_odt.pth
  best_model_odt_online.pth
```

------------------------------------------------------------

## 5. Requirements (Dependencies)

Python Version: 3.12

How to install all dependencies:

Using pip:
```
pip install -r requirements.txt
```

Using conda:
```
conda create -n dlproj python=3.12
conda activate dlproj
pip install torch --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt
```

Note: PyTorch must be installed with CUDA support for GPU acceleration.

------------------------------------------------------------

## 6. Running the Test Script

Command to run testing:
```
python test.py --model dt --ckpt models/best_model_dt.pth --device cuda:0 --num_episodes 20

python test.py --model bc --ckpt models/best_model_bc.pth --device cuda:0 --num_episodes 20

python test.py --model odt --ckpt models/best_model_odt_online.pth --device cuda:0 --num_episodes 20
```

------------------------------------------------------------

## 7. Running the Training Script

Train Behavior Cloning (baseline):
```
python train.py --model bc --epochs 50 --batch_size 64 --lr 1e-4 --device cuda:0 --out_dir outputs/
```

Train Decision Transformer:
```
python train.py --model dt --epochs 50 --batch_size 64 --lr 1e-4 --device cuda:0 --out_dir outputs/
```

Train Online Decision Transformer (offline + online):
```
python train.py --model odt --epochs 50 --online_epochs 10 --batch_size 64 --lr 1e-4 --device cuda:0 --out_dir outputs/
```

Optional arguments:
- `--seed 42`          (reproducibility)
- `--save_every 10`    (checkpoint frequency)
- `--eval_every 5`     (environment evaluation frequency)
- `--target_return 3600` (target return for DT/ODT)
- `--context_len 20`   (transformer context window)

------------------------------------------------------------

## 8. Submission Checklist

- [x] Dataset provided using Option A and placed correctly.
- [x] Model checkpoint instructions included.
- [x] requirements.txt generated and Python version specified.
- [x] Test command works.
- [x] Train command works.

------------------------------------------------------------