Bimanual ALOHA Teleoperation via Hand Tracking (CV)

This repository contains a complete system for:

Real-time teleoperation of a bimanual ALOHA robot in the MuJoCo physics simulator using only a standard webcam and bare hands.
An Imitation Learning pipeline ready for collecting expert demonstrations and training advanced policies like the Action-Chunking Transformer (ACT).

Key Features

Intuitive Hand-Tracking Control: Uses Google's MediaPipe to translate your hand movements into precise robot actions.
High-Fidelity Simulation: Leverages the power of MuJoCo and the bigym framework for realistic physics and complex interaction tasks.
Robust Client-Server Architecture: Decouples the vision processing (client) from the physics simulation (server) for maximum performance and stability.
Reproducible & Cross-Platform: The entire simulation environment is containerized using Docker, ensuring it runs identically anywhere.
AI-Ready: Built from the ground up to serve as a data collection tool for modern imitation learning algorithms.

Tech Stack & Architecture

Simulation: MuJoCo Physics Engine
Environment: Bigym & Gymnasium
Robot Model: ALOHA Bimanual Manipulator
Hand Tracking: Google MediaPipe
Inverse Kinematics (IK): mink
Containerization: Docker on WSL2

The system operates on a client-server model to ensure real-time performance by separating concerns:

+--------------------------+      +--------------------------------+
|     HOST (Windows)       |      |       DOCKER CONTAINER         |
|                          |      |      (Ubuntu + CUDA libs)      |
|  +-------------------+   |      |                                |
|  |   Webcam Feed     |   |      |  +--------------------------+  |
|  +--------+----------+   |      |  |      MuJoCo Physics      |  |
|           |              |      |  |     (ALOHA Robot Sim)    |  |
|           v              |      |  +------------+-------------+  |
|  +-------------------+   |      |               ^                |
|  | run_client.py     |   |      |               | Robot Commands |
|  | (MediaPipe Hand   |   |      |               |                |
|  |  Tracking)        |   |      |  +------------+-------------+  |
|  +--------+----------+   |      |  | control/hand_teleop.py   |  |
|           | Hand Coords  |      |  | (IK Solver & Sim Loop)   |  |
|           +--------------+----->+  +--------------------------+  |
|                          | UDP  |                                |
|                          |      |                                |
+--------------------------+      +--------------------------------+

Installation & Setup

Prerequisites

Git
Python 3.10+
Docker Desktop with WSL2 backend enabled.
An X Server for Windows like VcXsrv to view the simulation GUI from Docker. Remember to check "Disable access control" when launching VcXsrv.

1. Server Setup (Docker)

This process builds the container with all necessary robotics libraries and applies required patches.

# Clone this repository
git clone https://github.com/Rahul-Lashkari/aloha-vision.git
cd aloha-vision

# Build the Docker image. This may take several minutes.
docker build -t aloha-server .

2. Client Setup (Windows)

This creates a local Python environment for the webcam client.

# From the project root directory
# Create and activate a virtual environment
python -m venv .venv
.\.venv\Scripts\activate

# Install required packages
pip install mediapipe opencv-python

How to Run

The system requires two terminals running simultaneously.

Terminal 1: Launch the Server (Docker)

# Start the Docker container with display forwarding and volume mounting
docker run -it --rm -e "DISPLAY=host.docker.internal:0.0" -v "%cd%:/app" -v /app/.venv aloha-server

# Inside the container, activate the environment and run the server script
source .venv/bin/activate
python control/hand_teleop.py

A MuJoCo window showing the robot should appear on your desktop. The server is now listening.

Terminal 2: Launch the Client (Windows)

Find the Docker container's IP address. Open a new terminal and run:
```
wsl -d Docker-Desktop ifconfig
```
Look for the inet address under the eth0 interface (e.g., 172.x.x.x).

Set the environment variable for the server's IP.

# For Command Prompt:
set ALOHA_SERVER_IP=172.x.x.x

# For PowerShell:
$env:ALOHA_SERVER_IP="172.x.x.x"

(Replace 172.x.x.x with the actual IP you found).

Run the client script from your activated local environment:

# Make sure u're in the project root and ur venv is active
python run_client.py

A webcam window will open. Your hand movements will now control the robot in the MuJoCo simulation!

Acknowledgments

This project stands on the shoulders of giants. It is heavily inspired by the work of AlmondGod on the Nintendo-Aloha project.
The simulation environment is built upon the excellent bigym framework.
Inverse Kinematics are solved using the powerful mink library.

Bigym Citation

@article{chernyadev2024bigym,
  title={BiGym: A Demo-Driven Mobile Bi-Manual Manipulation Benchmark},
  author={Chernyadev, Nikita and Backshall, Nicholas and Ma, Xiao and Lu, Yunfan and Seo, Younggyo and James, Stephen},
  journal={arXiv preprint arXiv:2407.07788},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
bigym		bigym
control		control
demonstrations		demonstrations
doc		doc
envs		envs
examples		examples
mink-main		mink-main
models		models
tests		tests
tools		tools
vr		vr
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
run_client.py		run_client.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Bimanual ALOHA Teleoperation via Hand Tracking (CV)

Key Features

Tech Stack & Architecture

Installation & Setup

Prerequisites

1. Server Setup (Docker)

2. Client Setup (Windows)

How to Run

Acknowledgments

Bigym Citation

About

Uh oh!

Releases

Packages

Languages

License

Rahul-Lashkari/aloha-vision

Folders and files

Latest commit

History

Repository files navigation

Bimanual ALOHA Teleoperation via Hand Tracking (CV)

Key Features

Tech Stack & Architecture

Installation & Setup

Prerequisites

1. Server Setup (Docker)

2. Client Setup (Windows)

How to Run

Acknowledgments

Bigym Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages