Tinker allows you to flexibly customize your training environment. We will first introduce a few simple training scripts to help you get started, and then cover a broad range of different use cases.
Tinker Cookbook comes with useful abstractions so you can flexibly customize your experiments. Here are some minimal launch scripts:
rl_basic.py: a template script to configure reinforcement learning.sl_basic.py: a template script to configure supervised learning.
To explain what goes under-the-hood, we also provide minimal, self-contained scripts that directly use the TinkerAPI to train LLMs.
rl_loop.py: a minimal reinforcement learning training loop.sl_loop.py: a minimal supervised learning training loop.
Building on Tinker and Tinker Cookbook, we can easily customize a wide range of training environments for LLMs. We provide the following examples:
- Chat supervised learning: supervised fine-tuning on conversational datasets like Tulu3.
- Math reasoning: improve LLM reasoning capability by rewarding it for answering math questions correctly.
- Preference learning: showcase a three-stage RLHF pipeline: 1) supervised fine-tuning, 2) learning a reward model, 3) RL against the reward model.
- Tool use: train LLMs to better use retrieval tools to answer questions more accurately.
- Prompt distillation: internalize long and complex instructions into LLMs.
- Multi-Agent: optimize LLMs to play against another LLM or themselves.
- Model distillation: use on-policy distillation or SFT to distill intelligence from a teacher model.
These examples are located in each subfolder, and their README.md file will walk you through the key implementation details, the commands to run them, and the expected performance.
Our examples support the following CLI arguments to log the results.
wandb_project: When provided, logs will be sent to your Weights & Biases project. Without this argument, training scripts save logs locally only.log_path: Controls where training artifacts are saved.
- Default behavior: If not specified, each run generates a unique name and saves to
/tmp/tinker-examples - Output files:
{log_path}/metrics.jsonlsaves training metrics.{log_path}/checkpoints.jsonlrecords all the checkpoints saved during training. You can share these checkpoints for model release, offline evaluation, etc.
- Resuming: When using an existing
log_path, you can either overwrite the previous run or resume training. This is particularly useful for recovering from runtime interruptions.