Boardgame Workbench with NVIDIA-Cosmos and Artefacts

This repository provides tools to train, run, and evaluate (test) robots to play boardgames. Examples are done using:

LeRobot SO101 - Robot Embodiment
Cosmos-Reason2 - Visual Reasoning
Isaac GR00T - Control Policy
Isaac Lab - Physics Simulation
Artefacts - Automated Testing

Install

connect-four-demo.mp4

Prerequisites

A machine with an NVIDIA card (Tested on mobile 5070ti (12GB) and desktop 4060ti (16GB))
uv
git lfs

Clone and Pull LFS Objects

This repo uses git submodules (which themselves use Git LFS), so clone recursively:

git clone --recurse-submodules https://github.com/art-e-fact/connect_four-demo.git
cd connect_four-demo
git submodule foreach --recursive 'git lfs pull'

Project structure

This repository is organized as a collection of independent Python packages (e.g., cosmos-reason-node, gr00t-node, simulation, so100-driver, cosmos-visual-tester).

Instead of a monolithic workspace, we maintain strict isolation between modules. This simplifies dependency management and allows you to easily cherry-pick parts for your own projects. Because of this, we use uv with the --directory flag to run commands within the context of a specific package.

Note: All commands in this README assume you are running them from the repository root using the pattern:

uv run --directory <package_name> <command> ...

### Test installation

Test everything was correctly installed by running the teleop-agent in simulation (uv will automatically pull in Isaac Sim)

```bash
uv run --directory simulation teleop-agent \
  --task LeIsaac-SO101-ConnectFour-Ball-v0 \
  --teleop_device=keyboard \
  --enable_cameras

Inference

You will need a huggingface account to pull the relevant models

For Both Simulation and Real Hardware

Start the strategy server

Change port if needed

uv run --directory cosmos-reason-node strategy-server \
    --host 0.0.0.0 \
    --port 5556

(You can add the --device cuda:0 flag if your graphics card is capable of running cosmos-reason, groot, and IsaacSim all on the GPU)

Start the policy server

Change model-path and port if needed

uv run --directory gr00t-node server \
    --embodiment-tag NEW_EMBODIMENT \
    --model-path tomo202/groot_n1_6_so101-isaac-connect-four-ball_checkpoint \
    --device cuda:0 \
    --host 0.0.0.0 \
    --port 5555

For Simulation

Run the simulation client

uv run --directory simulation inference \
  --task LeIsaac-SO101-ConnectFour-Ball-v0 \
  --policy_host=localhost \
  --policy_port=5555 \
  --strategy_host=localhost \
  --strategy_port=5556

When the robot receives a new strategy from cosmos-reason, a new ball (alternating colours) will be placed in front of the robot. If the ball is unreachable for whatever reason (e.g. goes out of bounds, isn't placed correctly), press "P" to spawn a new ball of the same colour.

With Real Hardware

Run the SO101 client

Change policy_port, robot.port, robot.id, lang_instruction and robot.cameras if needed

uv run --directory so100-driver so100\
  --robot.type=so101_follower \
  --robot.port=/dev/ttyACM0 \
  --robot.id=white_follower \
  --robot.cameras="{ wrist: {type: opencv, index_or_path: /dev/video0, width: 640, height: 480, fps: 30}, front: {type: opencv, index_or_path: /dev/video2, width: 640, height: 480, fps: 30}}" \
  --policy_host=localhost \
  --policy_port=5555 \
  --strategy_host=localhost \
  --strategy_port=5556 \
  --lang_instruction="Play a game of connect four against me, and try to win!"

Dataset Collection with the Teleop Agent

With real hardware

Assumes LeRobot SO101 is set up and configured.

Change flags as required.

conda activate lerobot
lerobot-record \ 
  --robot.type=so101_follower \
  --robot.port=/dev/ttyACM0 \
  --robot.id=follower_arm \
  --robot.cameras='{ wrist: {type: opencv, index_or_path: 4, width: 640, height: 480, fps: 30, fourcc: "MJPG"}, front: {type: opencv, index_or_path: 6, width: 640, height: 480, fps: 30, fourcc: "MJPG"}}' \
  --teleop.type=so101_leader \
  --teleop.port=/dev/ttyACM1 \
  --teleop.id=leader_arm \
  --display_data=true \
  --dataset.repo_id=<my-repo> \
  --dataset.num_episodes=20 \
  --dataset.single_task="play connect 4"

With simulation

uv run --directory simulation teleop-agent \
    --task=LeIsaac-SO101-ConnectFour-Ball-v0 \
    --teleop_device=so101leader \
    --port=/dev/ttyACM0 \
    --num_envs=1 \
    --device=cuda \
    --enable_cameras \
    --record \
    --use_lerobot_recorder \
    --lerobot_dataset_repo_id=hf-username/dataset-name

Nebius (Remote GPU)

Both Dataset Massaging with Cosmos and Training can run on Nebius GPU VMs. One-time setup:

# Install Nebius CLI & authenticate
curl -sSL https://storage.eu-north1.nebius.cloud/cli/install.sh | bash
nebius profile create --profile <unique-name-here> \
  --endpoint api.nebius.cloud \
  --federation-endpoint auth.nebius.com \
  --parent-id <project-id-from-web-console>
  # Browser will open for auth

# jq (for JSON parsing)
sudo apt install jq
# SSH key (if you don't have one)
ssh-keygen -t ed25519

# HuggingFace token (used by both scripts)
export HF_TOKEN="hf_..."

Jobs run detached — you can Ctrl+C or close your terminal and they continue on the VM.

# Check Status
./scripts/nebius-cosmos.sh --check
./scripts/nebius-train.sh --check

# Cleanup (tear down VM + disk after a job finishes):
./scripts/nebius-cosmos.sh --cleanup
./scripts/nebius-train.sh --cleanup

Note: VMs stay running after jobs complete. Always run --cleanup when done to avoid charges.

VM state is saved in ~/.nebius-cosmos/ and ~/.nebius-train/ so cleanup works even after restarting your machine.

Dataset Massaging with Cosmos

Use Cosmos-Reason to automatically identify and extract individual demonstrations from a raw teleop recording.

Locally

# 1. Generate annotations
uv run --directory cosmos-dataset-editor cosmos-generate <hf_dataset> --output-toml my-project.toml
# 2. (Optional) Review and fix the annotations in a TUI
uv run --directory cosmos-dataset-editor cosmos-edit my-project.toml
# 3. Create the new dataset and push to HuggingFace
uv run --directory cosmos-dataset-editor cosmos-recut my-project.toml --push-to-hub

Use --new-dataset-id <owner>/<name> on either cosmos-generate or cosmos-recut to override the default (<source>-recut).
Use --model nvidia/Cosmos-Reason2-8B on the cosmos-generate step for better accuracy (requires more VRAM).

On Nebius

Requires Nebius setup. Runs cosmos-generate + cosmos-recut with the 8B model, then pushes to HuggingFace.

./scripts/nebius-cosmos.sh \
    --docker-image tomolnorman/cosmos-recut:latest \
    --dataset-id   <huggingface-dataset-to-process>

(You can build Dockerfile.cosmos and push to your own hub also)

Optional flags:

Flag	Default	Description
`--new-dataset-id`	`<dataset-id>-recut`	Output dataset ID
`--camera-key`	auto-detect	Camera key to process
`--platform`	`gpu-h100-sxm`	Nebius GPU platform
`--preset`	`1gpu-16vcpu-200gb`	VM preset
`--disk-size`	`100`	Boot disk GiB

Training

Prerequisites

In addition to a HuggingFace account and token, you will need an API key from Weights and Biases

Local (Docker)

You will need the NVIDIA Container Toolkit to passthrough the GPU to Docker.

Build the image

docker build -t groot-training . # rename as you wish

Run with arguments:

docker run --gpus=all --shm-size=16g \
  -e LEROBOT_DS_ID="..." \
  -e MODEL_ID="..." \
  -e HF_TOKEN=hf_... \
  -e WANDB_API_KEY=... \
  -e GLOBAL_BATCH_SIZE=1 \
  groot-training

In particular, adjust shm-size (shared memory space, docker defaults to 64mb) and GLOBAL_BATCH_SIZE (according to how much vram you have)

On Nebius

Requires Nebius setup. Launches a fine-tuning run on a H100 VM — trains and uploads to HuggingFace.

export WANDB_API_KEY="..."

./scripts/nebius-train.sh \
    --docker-image  tomolnorman/groot-finetune:latest \
    --dataset-id    <huggingface-dataset-to-pull-from> \
    --model-id      <huggingface-model-to-push-to>

(You can build Dockerfile and push to your own hub also)

Optional flags:

Flag	Default	Description
`--max-steps`	`10000`	Training steps
`--learning-rate`	`1e-4`	Learning rate
`--batch-size`	`64`	Global batch size
`--save-steps`	`2500`	Checkpoint interval
`--platform`	`gpu-h100-sxm`	Nebius GPU platform
`--preset`	`1gpu-16vcpu-200gb`	VM preset
`--disk-size`	`250`	Boot disk GiB

Testing

In cosmos-visual-tester/tests/test_connect_four.py you will find an example of how to run test evaluations post-simulation using Cosmos-Reason. By doing so we can keep our pytest file simple, using natural language assertions. When running the test, it will start a simulation using Cosmos-Reason (strategy) and GR00T (motor control) for a few steps and record a video. The video is then analyzed by Cosmos-Reason against the assertion statements made in the test file.

Tests are ran headlessly by default. See the test README for environment variables that can be configured.

Locally

uv run --directory cosmos-visual-tester pytest -v

With Artefacts

Although the tests can be run locally, Artefacts can orchestrate the run and automatically upload results, logs, and the recorded video to the Artefacts Dashboard — making it easy to run and parameterize your tests, as well as view, store, and share test results across your team.

The artefacts-cli (installed via pip) and an artefacts.yaml file are required (the yaml is already in this repository).

Installation and setup

Create an account at app.artefacts.com
Create an organization and a project
Rename the project in the artefacts.yaml file to your <org_name>/<project_name>

Install the CLI (we suggest using a virtual environment or pipx):

pip install artefacts-cli
artefacts config add <org_name>/<project_name>

You will be redirected to the dashboard (browser) to create an API key — paste it into your terminal
Select N when prompted about whether to create a new artefacts.yaml (already in this repo)

Run

artefacts run test-cosmos

Results, logs, and a video will be uploaded to the dashboard.

See Docs for more information on using Artefacts.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
Isaac-GR00T @ 9b2c0ce		Isaac-GR00T @ 9b2c0ce
artefacts-connect-four-env @ 41fd027		artefacts-connect-four-env @ 41fd027
cosmos-dataset-editor		cosmos-dataset-editor
cosmos-reason-node		cosmos-reason-node
cosmos-visual-tester		cosmos-visual-tester
gr00t-node		gr00t-node
scripts		scripts
simulation		simulation
so100-driver		so100-driver
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
Dockerfile		Dockerfile
Dockerfile.cosmos		Dockerfile.cosmos
README.md		README.md
artefacts.yaml		artefacts.yaml
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Boardgame Workbench with NVIDIA-Cosmos and Artefacts

Contents

Install

Prerequisites

Clone and Pull LFS Objects

Project structure

Inference

For Both Simulation and Real Hardware

Start the strategy server

Start the policy server

For Simulation

With Real Hardware

Dataset Collection with the Teleop Agent

With real hardware

With simulation

Nebius (Remote GPU)

Dataset Massaging with Cosmos

Locally

On Nebius

Training

Prerequisites

Local (Docker)

On Nebius

Testing

Locally

With Artefacts

Installation and setup

Run

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Boardgame Workbench with NVIDIA-Cosmos and Artefacts

Contents

Install

Prerequisites

Clone and Pull LFS Objects

Project structure

Inference

For Both Simulation and Real Hardware

Start the strategy server

Start the policy server

For Simulation

With Real Hardware

Dataset Collection with the Teleop Agent

With real hardware

With simulation

Nebius (Remote GPU)

Dataset Massaging with Cosmos

Locally

On Nebius

Training

Prerequisites

Local (Docker)

On Nebius

Testing

Locally

With Artefacts

Installation and setup

Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages