-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathREADME.txt
More file actions
152 lines (110 loc) · 4.19 KB
/
Copy pathREADME.txt
File metadata and controls
152 lines (110 loc) · 4.19 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
# README
------------------------------------------------------------
## 1. Project Overview
Project Title: Sequence Modeling for Offline Reinforcement Learning
Model Type:
Transformer (Causal GPT-style)
Objective:
Offline Reinforcement Learning via Sequence Modeling — framing RL as a
conditional sequence prediction problem using transformer architectures.
Models Implemented:
1. Behavior Cloning (BC) — baseline transformer that imitates dataset actions
2. Decision Transformer (DT) — return-conditioned sequence model
3. Online Decision Transformer (ODT) — stochastic DT with online fine-tuning
Dataset Used:
D4RL hopper-medium-replay-v2
http://rail.eecs.berkeley.edu/datasets/offline_rl/gym_mujoco_v2/hopper_medium_replay-v2.hdf5
Expected test evaluation for sanity check: D4RL Normalized Score ~= 60-80 (DT)
------------------------------------------------------------
## 2. Repository Structure
```
Main/
train.py # Training script (all 3 models)
test.py # Evaluation script
dataset_setup.py # Downloads D4RL dataset
requirements.txt # Python dependencies
README.txt # This file
src/
__init__.py
model.py # BC, DT, ODT model architectures
dataloader.py # Dataset loading and PyTorch Datasets
utils.py # Evaluation and utility functions
models/ # Model checkpoints go here
data/ # Downloaded dataset goes here
outputs/ # Training outputs (losses, plots, checkpoints)
```
------------------------------------------------------------
## 3. Dataset (OPTION A — PUBLIC DATASET SPLITS)
Dataset Link:
http://rail.eecs.berkeley.edu/datasets/offline_rl/gym_mujoco_v2/hopper_medium_replay-v2.hdf5
Where to place the downloaded dataset:
```
data/
hopper-medium-replay-v2.hdf5
```
Download command:
```
python dataset_setup.py --output data/
```
------------------------------------------------------------
## 4. Model Checkpoint
Where to place the checkpoint after downloading:
```
models/
best_model_bc.pth
best_model_dt.pth
best_model_odt.pth
best_model_odt_online.pth
```
------------------------------------------------------------
## 5. Requirements (Dependencies)
Python Version: 3.12
How to install all dependencies:
Using pip:
```
pip install -r requirements.txt
```
Using conda:
```
conda create -n dlproj python=3.12
conda activate dlproj
pip install torch --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt
```
Note: PyTorch must be installed with CUDA support for GPU acceleration.
------------------------------------------------------------
## 6. Running the Test Script
Command to run testing:
```
python test.py --model dt --ckpt models/best_model_dt.pth --device cuda:0 --num_episodes 20
python test.py --model bc --ckpt models/best_model_bc.pth --device cuda:0 --num_episodes 20
python test.py --model odt --ckpt models/best_model_odt_online.pth --device cuda:0 --num_episodes 20
```
------------------------------------------------------------
## 7. Running the Training Script
Train Behavior Cloning (baseline):
```
python train.py --model bc --epochs 50 --batch_size 64 --lr 1e-4 --device cuda:0 --out_dir outputs/
```
Train Decision Transformer:
```
python train.py --model dt --epochs 50 --batch_size 64 --lr 1e-4 --device cuda:0 --out_dir outputs/
```
Train Online Decision Transformer (offline + online):
```
python train.py --model odt --epochs 50 --online_epochs 10 --batch_size 64 --lr 1e-4 --device cuda:0 --out_dir outputs/
```
Optional arguments:
- `--seed 42` (reproducibility)
- `--save_every 10` (checkpoint frequency)
- `--eval_every 5` (environment evaluation frequency)
- `--target_return 3600` (target return for DT/ODT)
- `--context_len 20` (transformer context window)
------------------------------------------------------------
## 8. Submission Checklist
- [x] Dataset provided using Option A and placed correctly.
- [x] Model checkpoint instructions included.
- [x] requirements.txt generated and Python version specified.
- [x] Test command works.
- [x] Train command works.
------------------------------------------------------------