This repository provides tools to train, run, and evaluate (test) robots to play boardgames. Examples are done using:
- LeRobot SO101 - Robot Embodiment
- Cosmos-Reason2 - Visual Reasoning
- Isaac GR00T - Control Policy
- Isaac Lab - Physics Simulation
- Artefacts - Automated Testing
- Install
- Inference with Cosmos and Groot
- Dataset Collection with LeRobot
- Nebius (Remote GPU)
- Dataset Massaging with Cosmos
- Training with Groot
- Testing with Artefacts and Cosmos
connect-four-demo.mp4
This repo uses git submodules (which themselves use Git LFS), so clone recursively:
git clone --recurse-submodules https://github.com/art-e-fact/connect_four-demo.git
cd connect_four-demo
git submodule foreach --recursive 'git lfs pull'This repository is organized as a collection of independent Python packages (e.g., cosmos-reason-node, gr00t-node, simulation, so100-driver, cosmos-visual-tester).
Instead of a monolithic workspace, we maintain strict isolation between modules. This simplifies dependency management and allows you to easily cherry-pick parts for your own projects. Because of this, we use uv with the --directory flag to run commands within the context of a specific package.
Note: All commands in this README assume you are running them from the repository root using the pattern:
uv run --directory <package_name> <command> ...
### Test installation
Test everything was correctly installed by running the teleop-agent in simulation (uv will automatically pull in Isaac Sim)
```bash
uv run --directory simulation teleop-agent \
--task LeIsaac-SO101-ConnectFour-Ball-v0 \
--teleop_device=keyboard \
--enable_camerasYou will need a huggingface account to pull the relevant models
- Change
portif needed
uv run --directory cosmos-reason-node strategy-server \
--host 0.0.0.0 \
--port 5556(You can add the --device cuda:0 flag if your graphics card is capable of running cosmos-reason, groot, and IsaacSim all on the GPU)
- Change
model-pathandportif needed
uv run --directory gr00t-node server \
--embodiment-tag NEW_EMBODIMENT \
--model-path tomo202/groot_n1_6_so101-isaac-connect-four-ball_checkpoint \
--device cuda:0 \
--host 0.0.0.0 \
--port 5555Run the simulation client
uv run --directory simulation inference \
--task LeIsaac-SO101-ConnectFour-Ball-v0 \
--policy_host=localhost \
--policy_port=5555 \
--strategy_host=localhost \
--strategy_port=5556 When the robot receives a new strategy from cosmos-reason, a new ball (alternating colours) will be placed in front of the robot. If the ball is unreachable for whatever reason (e.g. goes out of bounds, isn't placed correctly), press "P" to spawn a new ball of the same colour.
Run the SO101 client
- Change
policy_port,robot.port,robot.id,lang_instructionandrobot.camerasif needed
uv run --directory so100-driver so100\
--robot.type=so101_follower \
--robot.port=/dev/ttyACM0 \
--robot.id=white_follower \
--robot.cameras="{ wrist: {type: opencv, index_or_path: /dev/video0, width: 640, height: 480, fps: 30}, front: {type: opencv, index_or_path: /dev/video2, width: 640, height: 480, fps: 30}}" \
--policy_host=localhost \
--policy_port=5555 \
--strategy_host=localhost \
--strategy_port=5556 \
--lang_instruction="Play a game of connect four against me, and try to win!"Assumes LeRobot SO101 is set up and configured.
- Change flags as required.
conda activate lerobot
lerobot-record \
--robot.type=so101_follower \
--robot.port=/dev/ttyACM0 \
--robot.id=follower_arm \
--robot.cameras='{ wrist: {type: opencv, index_or_path: 4, width: 640, height: 480, fps: 30, fourcc: "MJPG"}, front: {type: opencv, index_or_path: 6, width: 640, height: 480, fps: 30, fourcc: "MJPG"}}' \
--teleop.type=so101_leader \
--teleop.port=/dev/ttyACM1 \
--teleop.id=leader_arm \
--display_data=true \
--dataset.repo_id=<my-repo> \
--dataset.num_episodes=20 \
--dataset.single_task="play connect 4"uv run --directory simulation teleop-agent \
--task=LeIsaac-SO101-ConnectFour-Ball-v0 \
--teleop_device=so101leader \
--port=/dev/ttyACM0 \
--num_envs=1 \
--device=cuda \
--enable_cameras \
--record \
--use_lerobot_recorder \
--lerobot_dataset_repo_id=hf-username/dataset-nameBoth Dataset Massaging with Cosmos and Training can run on Nebius GPU VMs. One-time setup:
# Install Nebius CLI & authenticate
curl -sSL https://storage.eu-north1.nebius.cloud/cli/install.sh | bash
nebius profile create --profile <unique-name-here> \
--endpoint api.nebius.cloud \
--federation-endpoint auth.nebius.com \
--parent-id <project-id-from-web-console>
# Browser will open for auth
# jq (for JSON parsing)
sudo apt install jq
# SSH key (if you don't have one)
ssh-keygen -t ed25519
# HuggingFace token (used by both scripts)
export HF_TOKEN="hf_..."Jobs run detached — you can Ctrl+C or close your terminal and they continue on the VM.
# Check Status
./scripts/nebius-cosmos.sh --check
./scripts/nebius-train.sh --check
# Cleanup (tear down VM + disk after a job finishes):
./scripts/nebius-cosmos.sh --cleanup
./scripts/nebius-train.sh --cleanupNote: VMs stay running after jobs complete. Always run --cleanup when done to avoid charges.
VM state is saved in ~/.nebius-cosmos/ and ~/.nebius-train/ so cleanup works even after restarting your machine.
Use Cosmos-Reason to automatically identify and extract individual demonstrations from a raw teleop recording.
# 1. Generate annotations
uv run --directory cosmos-dataset-editor cosmos-generate <hf_dataset> --output-toml my-project.toml
# 2. (Optional) Review and fix the annotations in a TUI
uv run --directory cosmos-dataset-editor cosmos-edit my-project.toml
# 3. Create the new dataset and push to HuggingFace
uv run --directory cosmos-dataset-editor cosmos-recut my-project.toml --push-to-hub- Use
--new-dataset-id <owner>/<name>on eithercosmos-generateorcosmos-recutto override the default (<source>-recut). - Use
--model nvidia/Cosmos-Reason2-8Bon thecosmos-generatestep for better accuracy (requires more VRAM).
Requires Nebius setup. Runs cosmos-generate + cosmos-recut with the 8B model, then pushes to HuggingFace.
./scripts/nebius-cosmos.sh \
--docker-image tomolnorman/cosmos-recut:latest \
--dataset-id <huggingface-dataset-to-process>(You can build Dockerfile.cosmos and push to your own hub also)
Optional flags:
| Flag | Default | Description |
|---|---|---|
--new-dataset-id |
<dataset-id>-recut |
Output dataset ID |
--camera-key |
auto-detect | Camera key to process |
--platform |
gpu-h100-sxm |
Nebius GPU platform |
--preset |
1gpu-16vcpu-200gb |
VM preset |
--disk-size |
100 |
Boot disk GiB |
In addition to a HuggingFace account and token, you will need an API key from Weights and Biases
You will need the NVIDIA Container Toolkit to passthrough the GPU to Docker.
- Build the image
docker build -t groot-training . # rename as you wish- Run with arguments:
docker run --gpus=all --shm-size=16g \
-e LEROBOT_DS_ID="..." \
-e MODEL_ID="..." \
-e HF_TOKEN=hf_... \
-e WANDB_API_KEY=... \
-e GLOBAL_BATCH_SIZE=1 \
groot-trainingIn particular, adjust shm-size (shared memory space, docker defaults to 64mb) and GLOBAL_BATCH_SIZE (according to how much vram you have)
Requires Nebius setup. Launches a fine-tuning run on a H100 VM — trains and uploads to HuggingFace.
export WANDB_API_KEY="..."
./scripts/nebius-train.sh \
--docker-image tomolnorman/groot-finetune:latest \
--dataset-id <huggingface-dataset-to-pull-from> \
--model-id <huggingface-model-to-push-to>(You can build Dockerfile and push to your own hub also)
Optional flags:
| Flag | Default | Description |
|---|---|---|
--max-steps |
10000 |
Training steps |
--learning-rate |
1e-4 |
Learning rate |
--batch-size |
64 |
Global batch size |
--save-steps |
2500 |
Checkpoint interval |
--platform |
gpu-h100-sxm |
Nebius GPU platform |
--preset |
1gpu-16vcpu-200gb |
VM preset |
--disk-size |
250 |
Boot disk GiB |
In cosmos-visual-tester/tests/test_connect_four.py you will find an example of how to run test evaluations post-simulation using Cosmos-Reason. By doing so we can keep our pytest file simple, using natural language assertions. When running the test, it will start a simulation using Cosmos-Reason (strategy) and GR00T (motor control) for a few steps and record a video. The video is then analyzed by Cosmos-Reason against the assertion statements made in the test file.
Tests are ran headlessly by default. See the test README for environment variables that can be configured.
uv run --directory cosmos-visual-tester pytest -vAlthough the tests can be run locally, Artefacts can orchestrate the run and automatically upload results, logs, and the recorded video to the Artefacts Dashboard — making it easy to run and parameterize your tests, as well as view, store, and share test results across your team.
The artefacts-cli (installed via pip) and an artefacts.yaml file are required (the yaml is already in this repository).
- Create an account at app.artefacts.com
- Create an organization and a project
- Rename the project in the artefacts.yaml file to your
<org_name>/<project_name> - Install the CLI (we suggest using a virtual environment or
pipx):pip install artefacts-cli artefacts config add <org_name>/<project_name>
- You will be redirected to the dashboard (browser) to create an API key — paste it into your terminal
- Select
Nwhen prompted about whether to create a new artefacts.yaml (already in this repo)
artefacts run test-cosmosResults, logs, and a video will be uploaded to the dashboard.
See Docs for more information on using Artefacts.