isaac-g1-vla

Vision-Language-Action (VLA) policy fine-tuning for Unitree G1 humanoid in NVIDIA Isaac Lab.

End-to-end VLA pipeline: data collection in Isaac Sim → LeRobot dataset format → SmolVLA fine-tuning → evaluation back in Isaac Sim.

Pipeline

  Isaac Lab (G1 + UR5e)            SmolVLA (450 M)             Isaac Lab eval
  - rollouts → episodes      ┌─────────────────────────┐       ┌─────────────────┐
  - cameras + actions        │ HuggingFaceTB/SmolVLA   │       │ HTTP server +   │
  └──────────┬─────────────► │ Vision-Language-Action  │ ───►  │ closed-loop     │
             │               │ + LeRobot trainer       │       │ evaluation      │
             v               └─────────────────────────┘       └─────────────────┘
  data/lerobot_dataset

Components

Folder	Purpose
`data/`	Episode collection + LeRobot conversion (`convert_to_lerobot.py`)
`train/`	SmolVLA fine-tune launcher
`models/`	`smolvla_wrapper.py` — inference wrapper around LeRobot SmolVLA
`eval/`	Isaac Sim evaluation: `eval_smolvla.py`, `eval_smolvla_http.py`, `smolvla_server.py`
`envs/`	UR5e pick-and-place env (Isaac Lab Direct workflow)
`vla_common/`	Shared utilities (camera config, action chunking, ...)
`configs/`	Experiment configs (`finetune_smolvla.yaml`, ...)
`checkpoints/`	Optional pre-trained checkpoints (see "Pre-trained Checkpoints" below)

Quick Start

Prerequisites

Component	Version
OS	Windows 11
GPU	NVIDIA RTX (Blackwell: driver 591.74)
Python	3.11
Isaac Sim	5.1.0
Isaac Lab	0.48.0 (release/2.3.0)
LeRobot	latest (with SmolVLA extras)

conda create -n vla_train python=3.11 -y
conda activate vla_train
pip install lerobot[smolvla]

1. Collect demos in Isaac Sim

.\isaaclab.bat -p source\isaaclab_tasks\isaaclab_tasks\direct\g1_vla\scripts\collect_demos.py --num_episodes 200 --task pick_place

Episodes are saved under data/raw_episodes/ (state, action, camera frames).

2. Convert to LeRobot dataset

python source\isaaclab_tasks\isaaclab_tasks\direct\g1_vla\data\convert_to_lerobot.py ^
  --input_dir data/raw_episodes ^
  --output_dir data/lerobot_dataset ^
  --fps 5

3. Fine-tune SmolVLA

python -m lerobot.scripts.lerobot_train ^
  --config_path source/isaaclab_tasks/isaaclab_tasks/direct/g1_vla/configs/experiments/finetune_smolvla.yaml ^
  --steps 20000

Output checkpoints land in experiments/smolvla_finetune_*/checkpoints/.

4. Evaluate in Isaac Sim (HTTP server + headless)

# Start inference server (separate terminal)
python source\isaaclab_tasks\isaaclab_tasks\direct\g1_vla\eval\smolvla_server.py ^
  --checkpoint experiments/smolvla_finetune_3000ep_seed456/checkpoints/last/pretrained_model ^
  --host 127.0.0.1 --port 8765 --device cuda:0

# Run evaluator (in env_isaaclab env)
.\isaaclab.bat -p source\isaaclab_tasks\isaaclab_tasks\direct\g1_vla\eval\eval_smolvla_http.py ^
  --enable_cameras --num_envs 1 --num_episodes 20 --headless ^
  --server_url http://127.0.0.1:8765 --task "pick up the red cube" --seeds 42

Or single-process evaluation:

.\isaaclab.bat -p source\isaaclab_tasks\isaaclab_tasks\direct\g1_vla\eval\eval_smolvla.py ^
  --enable_cameras --num_envs 1 --num_episodes 20 --headless ^
  --checkpoint experiments/smolvla_finetune_3000ep_seed456/checkpoints/last/pretrained_model

Default Config (`configs/experiments/finetune_smolvla.yaml`)

policy:
  type: smolvla
  pretrained_path: "HuggingFaceTB/SmolVLA-base"
  action_chunk_size: 10
  num_cameras: 1
  image_size: [224, 224]
  freeze_vision_encoder: true

training:
  num_steps: 20000
  batch_size: 32
  lr: 1.0e-4

Hardware (validated)

GPU: NVIDIA RTX 5070 Ti Laptop, 12 GB VRAM
CPU: Intel i9-13900HX (24 C / 32 T)
RAM: 64 GB DDR5-5200 dual-channel
Memory budget during fine-tune: ~6–8 GB VRAM (BF16), ~10–17 GB RAM
Throughput on this hardware: ~1.25 step/s, 20 K steps in ~9 hours

Pre-trained Checkpoints

checkpoints/ is reserved for shared SmolVLA pretrained_model/ directories. Fine-tuned models are typically too large for direct commit — preferred hosting:

Hugging Face Hub model card (free, GPU-friendly download)
Google Drive shared link (with gdown instructions)

When a stable checkpoint is published, this README will be updated with download instructions and direct usage commands.

License

MIT (see LICENSE).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

isaac-g1-vla

Pipeline

Components

Quick Start

Prerequisites

1. Collect demos in Isaac Sim

2. Convert to LeRobot dataset

3. Fine-tune SmolVLA

4. Evaluate in Isaac Sim (HTTP server + headless)

Default Config (`configs/experiments/finetune_smolvla.yaml`)

Hardware (validated)

Pre-trained Checkpoints

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
configs		configs
data		data
envs		envs
eval		eval
figures		figures
models		models
scripts		scripts
train		train
vla_common		vla_common
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py

Folders and files

Latest commit

History

Repository files navigation

isaac-g1-vla

Pipeline

Components

Quick Start

Prerequisites

1. Collect demos in Isaac Sim

2. Convert to LeRobot dataset

3. Fine-tune SmolVLA

4. Evaluate in Isaac Sim (HTTP server + headless)

Default Config (configs/experiments/finetune_smolvla.yaml)

Hardware (validated)

Pre-trained Checkpoints

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Default Config (`configs/experiments/finetune_smolvla.yaml`)

Packages