Skip to content

PickAgent: OpenVLA-powered Pick and Place Agent | Gradio&Simulation | Vision Language Action Model

Notifications You must be signed in to change notification settings

miladfa7/PickAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PickAgent: OpenVLA-powered Pick and Place Agent (Simulation)

OpenVLA is a open-source Vision-Language-Action (VLA) model with 7 billion parameters. Designed to empower robots with human-like perception and decision-making, it seamlessly integrates visual inputs and natural language instructions to perform diverse manipulation tasks. Trained on nearly a million episodes from the Open X-Embodiment dataset, OpenVLA sets a new standard for generalist robotic control. With a robust architecture combining SigLIP, DINOv2, and Llama 2 7B, it offers unparalleled adaptability and can be fine-tuned efficiently on consumer-grade GPUs, making advanced robotics more accessible than ever. Project Page

🎬 1. Demo

🌐 Gradio App Screenshot

Screenshot

Watch in Youtube!

PickAgent: OpenVLA-powered Pick and Place Agent(Simulation) PickAgent: OpenVLA-powered Pick and Place Agent(Simulation)

Gradio Demo

🚀 PickAgent is an AI-driven pick-and-place system powered by OpenVLA, showcasing advanced vision-language models in action. This simulation demonstrates how LLMs and computer vision work together for precise object manipulation.


🎥 Video Results

Prompt: Pick up the salad dressing and place it in the basket

video1.mp4

Prompt: Pick up the tomato sauce and place it in the baske.

video2.mp4

Prompt: pick up the cream cheese and place it in the baske

video3.mp4

Prompt: Pick up the alphabet soup and place it in the bask.

video4.mp4

🔧 2. Installation

# Create and activate conda environment
conda create -n openvla python=3.11 -y
conda activate openvla

# Install PyTorch. Below is a sample command to do this, but you should check the following link
# to find installation instructions that are specific to your compute platform:
# https://pytorch.org/get-started/locally/
conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia -y  # UPDATE ME!

# Clone and install the openvla repo
git clone https://github.com/openvla/openvla.git
cd openvla
pip install -e .

# Install Flash Attention 2 for training (https://github.com/Dao-AILab/flash-attention)
#   =>> If you run into difficulty, try `pip cache remove flash_attn` first
pip install packaging ninja
ninja --version; echo $?  # Verify Ninja --> should return exit code "0"
pip install "flash-attn==2.5.5" --no-build-isolation

Additionally, install other required packages for simulation:

git clone https://github.com/Lifelong-Robot-Learning/LIBERO.git
cd LIBERO
pip install -e .

cd openvla
pip install -r experiments/robot/libero/libero_requirements.txt

🚀 3. Inference

Gradio App

Run the Gradio app for inference:

   python3 gradio_demo.py

Here’s a summary of the Gradio inputs and outputs Inputs:

  • Task: Selects the task type for the simulation.
  • Task ID: Specifies the task instance ID.
  • Prompt: Input for natural language instructions to control the robot.
  • Preview Button: Updates the environment preview based on selected task.
  • Run Simulation Button: Run the simulation with the given prompt.

Outputs:

  • Preview Image: Shows the environment's first frame.
  • Simulation Video: Shows the simulation result video.

Command Line Interface

Run the python script for inference:

   python3 inference.py --prompt="pick up the salad dressing and place it in the basket" --task="libero_object" --task_id=2 --image_resize=1024 --output_video="outputs/videos"

4. OpenVLA Models

Model Download
General OpenVLA 🤗 HuggingFace
OpenVLA - Finetuned Libero Spatial 🤗 HuggingFace
OpenVLA - Finetuned Libero Object 🤗 HuggingFace
OpenVLA - Finetuned Libero Goal 🤗 HuggingFace
OpenVLA - Finetuned Libero 10 🤗 HuggingFace

🙏 5. Acknowledgement

models are borrowed from OpenVLA

About

PickAgent: OpenVLA-powered Pick and Place Agent | Gradio&Simulation | Vision Language Action Model

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages