PickAgent: OpenVLA-powered Pick and Place Agent (Simulation)

OpenVLA is a open-source Vision-Language-Action (VLA) model with 7 billion parameters. Designed to empower robots with human-like perception and decision-making, it seamlessly integrates visual inputs and natural language instructions to perform diverse manipulation tasks. Trained on nearly a million episodes from the Open X-Embodiment dataset, OpenVLA sets a new standard for generalist robotic control. With a robust architecture combining SigLIP, DINOv2, and Llama 2 7B, it offers unparalleled adaptability and can be fine-tuned efficiently on consumer-grade GPUs, making advanced robotics more accessible than ever. Project Page

🎬 1. Demo

🌐 Gradio App Screenshot

Watch in Youtube!

PickAgent: OpenVLA-powered Pick and Place Agent(Simulation)

Gradio Demo

🚀 PickAgent is an AI-driven pick-and-place system powered by OpenVLA, showcasing advanced vision-language models in action. This simulation demonstrates how LLMs and computer vision work together for precise object manipulation.

🎥 Video Results


Prompt: Pick up the salad dressing and place it in the basket video1.mp4	Prompt: Pick up the tomato sauce and place it in the baske. video2.mp4
Prompt: pick up the cream cheese and place it in the baske video3.mp4	Prompt: Pick up the alphabet soup and place it in the bask. video4.mp4

🔧 2. Installation

# Create and activate conda environment
conda create -n openvla python=3.11 -y
conda activate openvla

# Install PyTorch. Below is a sample command to do this, but you should check the following link
# to find installation instructions that are specific to your compute platform:
# https://pytorch.org/get-started/locally/
conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia -y  # UPDATE ME!

# Clone and install the openvla repo
git clone https://github.com/openvla/openvla.git
cd openvla
pip install -e .

# Install Flash Attention 2 for training (https://github.com/Dao-AILab/flash-attention)
#   =>> If you run into difficulty, try `pip cache remove flash_attn` first
pip install packaging ninja
ninja --version; echo $?  # Verify Ninja --> should return exit code "0"
pip install "flash-attn==2.5.5" --no-build-isolation

Additionally, install other required packages for simulation:

git clone https://github.com/Lifelong-Robot-Learning/LIBERO.git
cd LIBERO
pip install -e .

cd openvla
pip install -r experiments/robot/libero/libero_requirements.txt

🚀 3. Inference

Gradio App

Run the Gradio app for inference:

   python3 gradio_demo.py

Here’s a summary of the Gradio inputs and outputs Inputs:

Task: Selects the task type for the simulation.
Task ID: Specifies the task instance ID.
Prompt: Input for natural language instructions to control the robot.
Preview Button: Updates the environment preview based on selected task.
Run Simulation Button: Run the simulation with the given prompt.

Outputs:

Preview Image: Shows the environment's first frame.
Simulation Video: Shows the simulation result video.

Command Line Interface

Run the python script for inference:

   python3 inference.py --prompt="pick up the salad dressing and place it in the basket" --task="libero_object" --task_id=2 --image_resize=1024 --output_video="outputs/videos"

4. OpenVLA Models

Model	Download
General OpenVLA	🤗 HuggingFace
OpenVLA - Finetuned Libero Spatial	🤗 HuggingFace
OpenVLA - Finetuned Libero Object	🤗 HuggingFace
OpenVLA - Finetuned Libero Goal	🤗 HuggingFace
OpenVLA - Finetuned Libero 10	🤗 HuggingFace

🙏 5. Acknowledgement

models are borrowed from OpenVLA

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
gradio_demo.py		gradio_demo.py
inference.py		inference.py
openvla.py		openvla.py
robot_utils.py		robot_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PickAgent: OpenVLA-powered Pick and Place Agent (Simulation)

🎬 1. Demo

🌐 Gradio App Screenshot

Watch in Youtube!

🎥 Video Results

🔧 2. Installation

🚀 3. Inference

Gradio App

Command Line Interface

4. OpenVLA Models

🙏 5. Acknowledgement

About

Uh oh!

Releases

Packages

Languages

miladfa7/PickAgent

Folders and files

Latest commit

History

Repository files navigation

PickAgent: OpenVLA-powered Pick and Place Agent (Simulation)

🎬 1. Demo

🌐 Gradio App Screenshot

Watch in Youtube!

🎥 Video Results

🔧 2. Installation

🚀 3. Inference

Gradio App

Command Line Interface

4. OpenVLA Models

🙏 5. Acknowledgement

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages