Skip to content

Latest commit

 

History

History
50 lines (36 loc) · 1.37 KB

File metadata and controls

50 lines (36 loc) · 1.37 KB

VLM Multi-Turn Math (Geometry3K with Tool Calling)

Author: Siqi Zhu

This example demonstrates training a vision-language model to solve geometry problems with multi-turn tool calling.

Prerequisites

  1. Complete the Installation steps
  2. Get your IP address: hostname -I

Step 1: Start the Scheduler (Server Side)

bash opentinker/scripts/launch_scheduler.sh --scheduler-port <scheduler_port>

Step 2: Start the Geo3K Tool Environment (Client Side)

python opentinker/environment/geo3k/geo3k_tool_server.py --port <env_port>

Step 3: Generate Training Data

python opentinker/data_preprocess/geo3k_multiturn_w_interaction.py \
    --local_save_dir=data/geo3k_multiturn_w_tool

Step 4: Run Training

python opentinker/client/geo3k_tool_rl.py \
    tokenizer_path=Qwen/Qwen2-VL-2B-Instruct \
    batch_size=16 \
    val_batch_size=64 \
    data_path=data/geo3k_multiturn_w_tool/train.parquet \
    val_data_path=data/geo3k_multiturn_w_tool/test.parquet \
    num_epochs=5 \
    save_freq=1000 \
    test_freq=5 \
    scheduler_url=http://<server_endpoint>:<scheduler_port> \
    interaction.config.env_port=<env_port> \
    interaction.config.env_host=<client_endpoint>

Performance

See wandb run for training metrics and results.