Skip to content

Real-time teleoperation of a simulated bimanual ALOHA robot using webcam-based hand tracking and inverse kinematics.

License

Notifications You must be signed in to change notification settings

Rahul-Lashkari/aloha-vision

Repository files navigation

Bimanual ALOHA Teleoperation via Hand Tracking (CV)

PRs Welcome Python 3.10 MuJoCo MediaPipe Docker PyTorch

Real-time bimanual robot control using webcam hand tracking

This repository contains a complete system for:

  1. Real-time teleoperation of a bimanual ALOHA robot in the MuJoCo physics simulator using only a standard webcam and bare hands.
  2. An Imitation Learning pipeline ready for collecting expert demonstrations and training advanced policies like the Action-Chunking Transformer (ACT).

Key Features

  • Intuitive Hand-Tracking Control: Uses Google's MediaPipe to translate your hand movements into precise robot actions.
  • High-Fidelity Simulation: Leverages the power of MuJoCo and the bigym framework for realistic physics and complex interaction tasks.
  • Robust Client-Server Architecture: Decouples the vision processing (client) from the physics simulation (server) for maximum performance and stability.
  • Reproducible & Cross-Platform: The entire simulation environment is containerized using Docker, ensuring it runs identically anywhere.
  • AI-Ready: Built from the ground up to serve as a data collection tool for modern imitation learning algorithms.

Tech Stack & Architecture

The system operates on a client-server model to ensure real-time performance by separating concerns:

+--------------------------+      +--------------------------------+
|     HOST (Windows)       |      |       DOCKER CONTAINER         |
|                          |      |      (Ubuntu + CUDA libs)      |
|  +-------------------+   |      |                                |
|  |   Webcam Feed     |   |      |  +--------------------------+  |
|  +--------+----------+   |      |  |      MuJoCo Physics      |  |
|           |              |      |  |     (ALOHA Robot Sim)    |  |
|           v              |      |  +------------+-------------+  |
|  +-------------------+   |      |               ^                |
|  | run_client.py     |   |      |               | Robot Commands |
|  | (MediaPipe Hand   |   |      |               |                |
|  |  Tracking)        |   |      |  +------------+-------------+  |
|  +--------+----------+   |      |  | control/hand_teleop.py   |  |
|           | Hand Coords  |      |  | (IK Solver & Sim Loop)   |  |
|           +--------------+----->+  +--------------------------+  |
|                          | UDP  |                                |
|                          |      |                                |
+--------------------------+      +--------------------------------+

Installation & Setup

Prerequisites

  • Git
  • Python 3.10+
  • Docker Desktop with WSL2 backend enabled.
  • An X Server for Windows like VcXsrv to view the simulation GUI from Docker. Remember to check "Disable access control" when launching VcXsrv.

1. Server Setup (Docker)

This process builds the container with all necessary robotics libraries and applies required patches.

# Clone this repository
git clone https://github.com/Rahul-Lashkari/aloha-vision.git
cd aloha-vision

# Build the Docker image. This may take several minutes.
docker build -t aloha-server .

2. Client Setup (Windows)

This creates a local Python environment for the webcam client.

# From the project root directory
# Create and activate a virtual environment
python -m venv .venv
.\.venv\Scripts\activate

# Install required packages
pip install mediapipe opencv-python

How to Run

The system requires two terminals running simultaneously.

Terminal 1: Launch the Server (Docker)

# Start the Docker container with display forwarding and volume mounting
docker run -it --rm -e "DISPLAY=host.docker.internal:0.0" -v "%cd%:/app" -v /app/.venv aloha-server

# Inside the container, activate the environment and run the server script
source .venv/bin/activate
python control/hand_teleop.py

A MuJoCo window showing the robot should appear on your desktop. The server is now listening.

Terminal 2: Launch the Client (Windows)

  1. Find the Docker container's IP address. Open a new terminal and run:

    wsl -d Docker-Desktop ifconfig

    Look for the inet address under the eth0 interface (e.g., 172.x.x.x).

  2. Set the environment variable for the server's IP.

    # For Command Prompt:
    set ALOHA_SERVER_IP=172.x.x.x
    
    # For PowerShell:
    $env:ALOHA_SERVER_IP="172.x.x.x"

    (Replace 172.x.x.x with the actual IP you found).

  3. Run the client script from your activated local environment:

    # Make sure u're in the project root and ur venv is active
    python run_client.py

A webcam window will open. Your hand movements will now control the robot in the MuJoCo simulation!

Acknowledgments

  • This project stands on the shoulders of giants. It is heavily inspired by the work of AlmondGod on the Nintendo-Aloha project.
  • The simulation environment is built upon the excellent bigym framework.
  • Inverse Kinematics are solved using the powerful mink library.

Bigym Citation

@article{chernyadev2024bigym,
  title={BiGym: A Demo-Driven Mobile Bi-Manual Manipulation Benchmark},
  author={Chernyadev, Nikita and Backshall, Nicholas and Ma, Xiao and Lu, Yunfan and Seo, Younggyo and James, Stephen},
  journal={arXiv preprint arXiv:2407.07788},
  year={2024}
}

About

Real-time teleoperation of a simulated bimanual ALOHA robot using webcam-based hand tracking and inverse kinematics.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages