Skip to content

mturan33/isaac-g1-vla

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

90 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

isaac-g1-vla

Vision-Language-Action (VLA) policy fine-tuning for Unitree G1 humanoid in NVIDIA Isaac Lab.

End-to-end VLA pipeline: data collection in Isaac Sim → LeRobot dataset format → SmolVLA fine-tuning → evaluation back in Isaac Sim.

per_task_bar_chart

Pipeline

  Isaac Lab (G1 + UR5e)            SmolVLA (450 M)             Isaac Lab eval
  - rollouts → episodes      ┌─────────────────────────┐       ┌─────────────────┐
  - cameras + actions        │ HuggingFaceTB/SmolVLA   │       │ HTTP server +   │
  └──────────┬─────────────► │ Vision-Language-Action  │ ───►  │ closed-loop     │
             │               │ + LeRobot trainer       │       │ evaluation      │
             v               └─────────────────────────┘       └─────────────────┘
  data/lerobot_dataset

Components

Folder Purpose
data/ Episode collection + LeRobot conversion (convert_to_lerobot.py)
train/ SmolVLA fine-tune launcher
models/ smolvla_wrapper.py — inference wrapper around LeRobot SmolVLA
eval/ Isaac Sim evaluation: eval_smolvla.py, eval_smolvla_http.py, smolvla_server.py
envs/ UR5e pick-and-place env (Isaac Lab Direct workflow)
vla_common/ Shared utilities (camera config, action chunking, ...)
configs/ Experiment configs (finetune_smolvla.yaml, ...)
checkpoints/ Optional pre-trained checkpoints (see "Pre-trained Checkpoints" below)

Quick Start

Prerequisites

Component Version
OS Windows 11
GPU NVIDIA RTX (Blackwell: driver 591.74)
Python 3.11
Isaac Sim 5.1.0
Isaac Lab 0.48.0 (release/2.3.0)
LeRobot latest (with SmolVLA extras)
conda create -n vla_train python=3.11 -y
conda activate vla_train
pip install lerobot[smolvla]

1. Collect demos in Isaac Sim

.\isaaclab.bat -p source\isaaclab_tasks\isaaclab_tasks\direct\g1_vla\scripts\collect_demos.py --num_episodes 200 --task pick_place

Episodes are saved under data/raw_episodes/ (state, action, camera frames).

2. Convert to LeRobot dataset

python source\isaaclab_tasks\isaaclab_tasks\direct\g1_vla\data\convert_to_lerobot.py ^
  --input_dir data/raw_episodes ^
  --output_dir data/lerobot_dataset ^
  --fps 5

3. Fine-tune SmolVLA

python -m lerobot.scripts.lerobot_train ^
  --config_path source/isaaclab_tasks/isaaclab_tasks/direct/g1_vla/configs/experiments/finetune_smolvla.yaml ^
  --steps 20000

Output checkpoints land in experiments/smolvla_finetune_*/checkpoints/.

4. Evaluate in Isaac Sim (HTTP server + headless)

# Start inference server (separate terminal)
python source\isaaclab_tasks\isaaclab_tasks\direct\g1_vla\eval\smolvla_server.py ^
  --checkpoint experiments/smolvla_finetune_3000ep_seed456/checkpoints/last/pretrained_model ^
  --host 127.0.0.1 --port 8765 --device cuda:0

# Run evaluator (in env_isaaclab env)
.\isaaclab.bat -p source\isaaclab_tasks\isaaclab_tasks\direct\g1_vla\eval\eval_smolvla_http.py ^
  --enable_cameras --num_envs 1 --num_episodes 20 --headless ^
  --server_url http://127.0.0.1:8765 --task "pick up the red cube" --seeds 42

Or single-process evaluation:

.\isaaclab.bat -p source\isaaclab_tasks\isaaclab_tasks\direct\g1_vla\eval\eval_smolvla.py ^
  --enable_cameras --num_envs 1 --num_episodes 20 --headless ^
  --checkpoint experiments/smolvla_finetune_3000ep_seed456/checkpoints/last/pretrained_model

Default Config (configs/experiments/finetune_smolvla.yaml)

policy:
  type: smolvla
  pretrained_path: "HuggingFaceTB/SmolVLA-base"
  action_chunk_size: 10
  num_cameras: 1
  image_size: [224, 224]
  freeze_vision_encoder: true

training:
  num_steps: 20000
  batch_size: 32
  lr: 1.0e-4

Hardware (validated)

  • GPU: NVIDIA RTX 5070 Ti Laptop, 12 GB VRAM
  • CPU: Intel i9-13900HX (24 C / 32 T)
  • RAM: 64 GB DDR5-5200 dual-channel
  • Memory budget during fine-tune: ~6–8 GB VRAM (BF16), ~10–17 GB RAM
  • Throughput on this hardware: ~1.25 step/s, 20 K steps in ~9 hours

Pre-trained Checkpoints

checkpoints/ is reserved for shared SmolVLA pretrained_model/ directories. Fine-tuned models are typically too large for direct commit — preferred hosting:

  1. Hugging Face Hub model card (free, GPU-friendly download)
  2. Google Drive shared link (with gdown instructions)

When a stable checkpoint is published, this README will be updated with download instructions and direct usage commands.


License

MIT (see LICENSE).

About

isaac-g1-vla

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages