Hierarchical RL with Hindsight Experience Replay

Combining temporal abstraction with sample-efficient learning for robotic manipulation.

Note: Built on TensorFlow 1.x / OpenAI Baselines (2017-2018 era). See Setup for environment requirements.

Overview

This project extends the OpenAI Baselines HER implementation with a two-level hierarchical structure. The key insight is that HER can be applied at both levels of the hierarchy—relabeling not just final goals, but also the intermediate subgoals.

Core idea: A meta-controller sets subgoals every k steps, while a sub-controller takes primitive actions to achieve those subgoals. Both controllers benefit from hindsight relabeling.

What I Modified

The HRL extension is implemented in src/hrl/:

File	Purpose
`herhrl.py`	Sampling functions that apply HER at both hierarchy levels
`meta_controller.py`	High-level policy that generates subgoals
`sub_policy.py`	Low-level policy + intrinsic reward computation

The key modification is in herhrl.py:_sample_herhrl_transitions():

Lines 58-62: Identify subgoal period boundaries
Lines 65-73: Relabel subgoals with achieved intermediate states (the HRL extension)
Lines 75-78: Compute intrinsic rewards for subgoal achievement

The training pipeline uses baselines/baselines/herhrl/, which follows the OpenAI Baselines structure for DDPG+HER.

Results

Environment	HER	HER+HRL	Outcome
FetchReach	~100%	~100%	✓ Baseline
FetchPush	~100%	~100%	✓ No degradation
FetchPickAndPlace	~100%	~100%	✓ No degradation
FetchSlide	60%	70%	+17% relative

Why HRL only helps on FetchSlide: Hierarchical methods provide benefits proportional to task horizon length (Nachum et al., 2018). Short-horizon tasks like FetchReach and FetchPush are solvable within a single subgoal period—temporal abstraction provides no advantage. FetchSlide requires hitting a puck to a distant target with indirect contact, creating a longer effective horizon where subgoal decomposition becomes valuable.

This aligns with the theoretical result that HRL's sample complexity advantage scales with O(H/k), where H is horizon length and k is the subgoal interval. When H ≈ k (simple tasks), the ratio approaches 1 and HRL reduces to flat RL. When H >> k (FetchSlide), hierarchical decomposition provides meaningful gains.

Compute constraints: These experiments were run on a 4-CPU machine with 500 training epochs—significantly less than the 100+ CPUs and longer training runs used in the original HER research. With only 1 CPU, neither method exceeded 20% success on FetchSlide. The 70% result with HER+HRL suggests the hierarchical approach is particularly effective in resource-constrained settings.

Training curves in results/.

Demo videos: FetchPush | PickAndPlace | Slide

Project Structure

src/hrl/                    # HRL implementation (my code)
├── herhrl.py               # HER+HRL sampling
├── meta_controller.py      # Subgoal generation
└── sub_policy.py           # Low-level policy

baselines/baselines/herhrl/ # Modified OpenAI Baselines
├── ddpg.py                 # DDPG with subgoal conditioning
├── rollout.py              # Episode collection
└── experiment/             # Training scripts

results/                    # Experiment outputs
docs/                       # Academic reports

Usage

# Train on FetchPush
python -m baselines.herhrl.experiment.train \
    --env_name FetchPush-v0 \
    --n_epochs 200 \
    --num_cpu 4

# Visualize trained policy
python -m baselines.herhrl.experiment.play results/push200herhrl/policy_best.pkl

Setup

Requires TensorFlow 1.x, MuJoCo, and mpi4py:

pip install tensorflow==1.15 mujoco-py gym[robotics] mpi4py
cd baselines && pip install -e .

References

Andrychowicz et al., "Hindsight Experience Replay" (NeurIPS 2017)
Nachum et al., "Data-Efficient Hierarchical RL" (NeurIPS 2018)

See docs/ for the full project report and proposal.

COMP 781: Robotics (Graduate) — UNC Chapel Hill
Completed as a sophomore undergraduate

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
baselines		baselines
docs		docs
presentation		presentation
results		results
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hierarchical RL with Hindsight Experience Replay

Overview

What I Modified

Results

Project Structure

Usage

Setup

References

About

Uh oh!

Releases

Packages

Languages

ArmaanSethi/Hindsight-Experience-Replay-and-Hierarchical-Reinforcement-Learning

Folders and files

Latest commit

History

Repository files navigation

Hierarchical RL with Hindsight Experience Replay

Overview

What I Modified

Results

Project Structure

Usage

Setup

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages