Skip to content

ArmaanSethi/Hindsight-Experience-Replay-and-Hierarchical-Reinforcement-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hierarchical RL with Hindsight Experience Replay

Python 3.6+ TensorFlow 1.x License: MIT

Combining temporal abstraction with sample-efficient learning for robotic manipulation.

Note: Built on TensorFlow 1.x / OpenAI Baselines (2017-2018 era). See Setup for environment requirements.


Overview

This project extends the OpenAI Baselines HER implementation with a two-level hierarchical structure. The key insight is that HER can be applied at both levels of the hierarchy—relabeling not just final goals, but also the intermediate subgoals.

Core idea: A meta-controller sets subgoals every k steps, while a sub-controller takes primitive actions to achieve those subgoals. Both controllers benefit from hindsight relabeling.


What I Modified

The HRL extension is implemented in src/hrl/:

File Purpose
herhrl.py Sampling functions that apply HER at both hierarchy levels
meta_controller.py High-level policy that generates subgoals
sub_policy.py Low-level policy + intrinsic reward computation

The key modification is in herhrl.py:_sample_herhrl_transitions():

  • Lines 58-62: Identify subgoal period boundaries
  • Lines 65-73: Relabel subgoals with achieved intermediate states (the HRL extension)
  • Lines 75-78: Compute intrinsic rewards for subgoal achievement

The training pipeline uses baselines/baselines/herhrl/, which follows the OpenAI Baselines structure for DDPG+HER.


Results

Environment HER HER+HRL Outcome
FetchReach ~100% ~100% ✓ Baseline
FetchPush ~100% ~100% ✓ No degradation
FetchPickAndPlace ~100% ~100% ✓ No degradation
FetchSlide 60% 70% +17% relative

Why HRL only helps on FetchSlide: Hierarchical methods provide benefits proportional to task horizon length (Nachum et al., 2018). Short-horizon tasks like FetchReach and FetchPush are solvable within a single subgoal period—temporal abstraction provides no advantage. FetchSlide requires hitting a puck to a distant target with indirect contact, creating a longer effective horizon where subgoal decomposition becomes valuable.

This aligns with the theoretical result that HRL's sample complexity advantage scales with O(H/k), where H is horizon length and k is the subgoal interval. When H ≈ k (simple tasks), the ratio approaches 1 and HRL reduces to flat RL. When H >> k (FetchSlide), hierarchical decomposition provides meaningful gains.

Compute constraints: These experiments were run on a 4-CPU machine with 500 training epochs—significantly less than the 100+ CPUs and longer training runs used in the original HER research. With only 1 CPU, neither method exceeded 20% success on FetchSlide. The 70% result with HER+HRL suggests the hierarchical approach is particularly effective in resource-constrained settings.

Training curves in results/.

Demo videos: FetchPush | PickAndPlace | Slide


Project Structure

src/hrl/                    # HRL implementation (my code)
├── herhrl.py               # HER+HRL sampling
├── meta_controller.py      # Subgoal generation
└── sub_policy.py           # Low-level policy

baselines/baselines/herhrl/ # Modified OpenAI Baselines
├── ddpg.py                 # DDPG with subgoal conditioning
├── rollout.py              # Episode collection
└── experiment/             # Training scripts

results/                    # Experiment outputs
docs/                       # Academic reports

Usage

# Train on FetchPush
python -m baselines.herhrl.experiment.train \
    --env_name FetchPush-v0 \
    --n_epochs 200 \
    --num_cpu 4

# Visualize trained policy
python -m baselines.herhrl.experiment.play results/push200herhrl/policy_best.pkl

Setup

Requires TensorFlow 1.x, MuJoCo, and mpi4py:

pip install tensorflow==1.15 mujoco-py gym[robotics] mpi4py
cd baselines && pip install -e .

References

See docs/ for the full project report and proposal.


COMP 781: Robotics (Graduate) — UNC Chapel Hill
Completed as a sophomore undergraduate

About

Comp 781 Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages