OpenVLA is a open-source Vision-Language-Action (VLA) model with 7 billion parameters. Designed to empower robots with human-like perception and decision-making, it seamlessly integrates visual inputs and natural language instructions to perform diverse manipulation tasks. Trained on nearly a million episodes from the Open X-Embodiment dataset, OpenVLA sets a new standard for generalist robotic control. With a robust architecture combining SigLIP, DINOv2, and Llama 2 7B, it offers unparalleled adaptability and can be fine-tuned efficiently on consumer-grade GPUs, making advanced robotics more accessible than ever. Project Page
PickAgent: OpenVLA-powered Pick and Place Agent(Simulation)
🚀 PickAgent is an AI-driven pick-and-place system powered by OpenVLA, showcasing advanced vision-language models in action. This simulation demonstrates how LLMs and computer vision work together for precise object manipulation.
Prompt: Pick up the salad dressing and place it in the basket video1.mp4 |
Prompt: Pick up the tomato sauce and place it in the baske. video2.mp4 |
Prompt: pick up the cream cheese and place it in the baske video3.mp4 |
Prompt: Pick up the alphabet soup and place it in the bask. video4.mp4 |
# Create and activate conda environment
conda create -n openvla python=3.11 -y
conda activate openvla
# Install PyTorch. Below is a sample command to do this, but you should check the following link
# to find installation instructions that are specific to your compute platform:
# https://pytorch.org/get-started/locally/
conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia -y # UPDATE ME!
# Clone and install the openvla repo
git clone https://github.com/openvla/openvla.git
cd openvla
pip install -e .
# Install Flash Attention 2 for training (https://github.com/Dao-AILab/flash-attention)
# =>> If you run into difficulty, try `pip cache remove flash_attn` first
pip install packaging ninja
ninja --version; echo $? # Verify Ninja --> should return exit code "0"
pip install "flash-attn==2.5.5" --no-build-isolation
Additionally, install other required packages for simulation:
git clone https://github.com/Lifelong-Robot-Learning/LIBERO.git
cd LIBERO
pip install -e .
cd openvla
pip install -r experiments/robot/libero/libero_requirements.txt
Run the Gradio app for inference:
python3 gradio_demo.py
Here’s a summary of the Gradio inputs and outputs Inputs:
- Task: Selects the task type for the simulation.
- Task ID: Specifies the task instance ID.
- Prompt: Input for natural language instructions to control the robot.
- Preview Button: Updates the environment preview based on selected task.
- Run Simulation Button: Run the simulation with the given prompt.
Outputs:
- Preview Image: Shows the environment's first frame.
- Simulation Video: Shows the simulation result video.
Run the python script for inference:
python3 inference.py --prompt="pick up the salad dressing and place it in the basket" --task="libero_object" --task_id=2 --image_resize=1024 --output_video="outputs/videos"
Model | Download |
---|---|
General OpenVLA | 🤗 HuggingFace |
OpenVLA - Finetuned Libero Spatial | 🤗 HuggingFace |
OpenVLA - Finetuned Libero Object | 🤗 HuggingFace |
OpenVLA - Finetuned Libero Goal | 🤗 HuggingFace |
OpenVLA - Finetuned Libero 10 | 🤗 HuggingFace |
models are borrowed from OpenVLA