docs: Rename Open-RL to OpenRL across all markdown files (#104)

droot · web-flow · commit 684fc1e3ed17 · 2026-05-21T16:37:36.000-04:00
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
-# Open-RL: self-hosted API for your RL Infrastructure
+# OpenRL: self-hosted API for your RL Infrastructure
 
-Open-RL implements [Tinker](https://tinker-docs.thinkingmachines.ai/) compatible API for fine-tuning language models that you can run on your own infrastructure (machine or a kubernetes cluster). You can use the Tinker SDK to orchestrate RL training loops by writing imperative Python code directly from your local machine.
+OpenRL implements [Tinker](https://tinker-docs.thinkingmachines.ai/) compatible API for fine-tuning language models that you can run on your own infrastructure (machine or a kubernetes cluster). You can use the Tinker SDK to orchestrate RL training loops by writing imperative Python code directly from your local machine.
 
 # Why Tinker
 
diff --git a/docs/architecture.md b/docs/architecture.md
@@ -1,12 +1,12 @@
-# Open-RL
+# OpenRL
 
-Open-RL implements post-training APIs to fine-tune language models on
+OpenRL implements post-training APIs to fine-tune language models on
 self-hosted infrastructure. These APIs cover common post-training techniques
 such as supervised fine-tuning, reinforcement learning, and related workflows.
 
-Conceptually, Open-RL decouples the researcher-facing training loop from the
+Conceptually, OpenRL decouples the researcher-facing training loop from the
 infrastructure that runs it. Researchers own datasets, environments, rewards,
-losses, and optimization logic; Open-RL owns the serving, scheduling, sampling,
+losses, and optimization logic; OpenRL owns the serving, scheduling, sampling,
 and storage needed to run that loop. This separation lets training methods and
 backend capacity evolve independently.
 
@@ -137,7 +137,7 @@ The sampler produces rollouts and token logprobs for the training loop. On one
 machine this can run through the same model state as training; in a cluster it
 can be a separate inference service that loads adapter snapshots.
 
-Keeping sampling as a separate concept lets Open-RL use the same API contract
+Keeping sampling as a separate concept lets OpenRL use the same API contract
 for single-machine iteration and cluster-backed inference. The client only sees
 sample results, not the backend routing choice.
 
diff --git a/docs/blog/from-mac-to-gke.md b/docs/blog/from-mac-to-gke.md
@@ -1,8 +1,8 @@
-# From Your Mac to GKE: Fine-Tuning Gemma with Open-RL
+# From Your Mac to GKE: Fine-Tuning Gemma with OpenRL
 
 RL fine-tuning is one of the most powerful ways to specialize language models — but the infrastructure behind it has traditionally been a nightmare. You're either wrestling with GPU allocation, rewriting training scripts for different backends, or managing job lifecycles by hand.
 
-[Open-RL](https://github.com/google/open-rl) is a self-hosted, open-source API that makes this simple. Write your training loop once using the Tinker SDK, run it on your Mac to iterate fast, then point it at a GKE cluster when you're ready to scale. Same code, any backend.
+[OpenRL](https://github.com/google/open-rl) is a self-hosted, open-source API that makes this simple. Write your training loop once using the Tinker SDK, run it on your Mac to iterate fast, then point it at a GKE cluster when you're ready to scale. Same code, any backend.
 
 Let's walk through it.
 
@@ -106,4 +106,4 @@ Check out the [Architecture Deep Dive](../architecture.md) for a detailed explan
 - **[GKE Deployment Guide](../deployment.md)** — Set up the distributed backend on Kubernetes
 - **[Architecture Deep Dive](../architecture.md)** — How the Gateway, Queue, and Clock Cycle Engine work together
 
-Open-RL is Apache 2.0 licensed. Contributions welcome.
+OpenRL is Apache 2.0 licensed. Contributions welcome.
diff --git a/docs/configuration.md b/docs/configuration.md
@@ -1,6 +1,6 @@
 # Configuration
 
-Open-RL is configured with environment variables. The examples below use plain
+OpenRL is configured with environment variables. The examples below use plain
 shell commands so they work even if `make` is not installed. The root
 `Makefile` wraps the same commands for convenience.
 
@@ -81,7 +81,7 @@ make server BASE_MODEL=google/gemma-4-e2b SAMPLING_BACKEND=vllm
 | Env var | Default | What it does |
 | --- | --- | --- |
 | `TINKER_BASE_URL` | `http://127.0.0.1:9003` | Base URL used by example clients and scripts. |
-| `TINKER_API_KEY` | `tml-dummy-key` | Passed through to the Tinker SDK. Local Open-RL does not enforce auth. |
+| `TINKER_API_KEY` | `tml-dummy-key` | Passed through to the Tinker SDK. Local OpenRL does not enforce auth. |
 | `HF_TOKEN` | unset | Required for gated Hugging Face models. `uv run hf auth login` is the easiest setup path. |
 | `ENABLE_GCP_TRACE` | `0` | `1` exports OpenTelemetry traces to Google Cloud Trace. |
 | `ENABLE_CONSOLE_TRACE` | `0` | `1` prints trace spans to stdout for debugging. |
diff --git a/docs/rl_concepts.md b/docs/rl_concepts.md
@@ -1,5 +1,5 @@
-# Reinforcement Learning Concepts in Open-RL
+# Reinforcement Learning Concepts in OpenRL
 
-This document provides an overview of the core Reinforcement Learning (RL) concepts used in the Open-RL project.
+This document provides an overview of the core Reinforcement Learning (RL) concepts used in the OpenRL project.
 
 <!-- TODO: Add RL concepts -->
diff --git a/docs/setup/gke-setup.md b/docs/setup/gke-setup.md
@@ -1,6 +1,6 @@
 # GKE Setup Guide
 
-This guide describes how to create a minimal GKE Standard cluster to run Open-RL workloads. It sets up the Open-RL gateway, one vLLM worker, one trainer worker, Redis, and a shared Filestore PVC.
+This guide describes how to create a minimal GKE Standard cluster to run OpenRL workloads. It sets up the OpenRL gateway, one vLLM worker, one trainer worker, Redis, and a shared Filestore PVC.
 
 This guide is based on the [Text-to-SQL recipe](../../examples/text-to-sql/README.md) requirements.
 
@@ -93,7 +93,7 @@ Connect `kubectl`:
 gcloud container clusters get-credentials "${CLUSTER}" --location="${REGION}"
 ```
 
-## 3. Deploy Open-RL
+## 3. Deploy OpenRL
 
 Deploy the manifests using the Kustomize overlay. You should apply **only one** of the following, depending on your needs:
 
@@ -146,7 +146,7 @@ curl http://127.0.0.1:9003/api/v1/healthz
 curl http://127.0.0.1:9003/api/v1/get_server_capabilities
 ```
 
-The Open-RL server is now available at `http://127.0.0.1:9003`.
+The OpenRL server is now available at `http://127.0.0.1:9003`.
 
 ## 5. Clean Up
 
diff --git a/docs/setup/local-setup.md b/docs/setup/local-setup.md
@@ -1,6 +1,6 @@
 # Local Setup Guide
 
-This guide describes how to set up a local environment (or a single VM) to run Open-RL workloads.
+This guide describes how to set up a local environment (or a single VM) to run OpenRL workloads.
 
 ## Prerequisites
 
@@ -85,9 +85,9 @@ export VLLM_ARCHITECTURE_OVERRIDE=Gemma4ForCausalLM
 make vllm
 ```
 
-### 3. Start the Open-RL Server
+### 3. Start the OpenRL Server
 
-In a **second terminal session**, start the Open-RL gateway and trainer on GPU 1:
+In a **second terminal session**, start the OpenRL gateway and trainer on GPU 1:
 
 ```bash
 export CUDA_VISIBLE_DEVICES=1
@@ -96,4 +96,4 @@ export SAMPLING_BACKEND=vllm
 make server
 ```
 
-The Open-RL server is now available at `http://127.0.0.1:9003`.
+The OpenRL server is now available at `http://127.0.0.1:9003`.
diff --git a/docs/tinker-client-compatibility.md b/docs/tinker-client-compatibility.md
@@ -4,7 +4,7 @@ Generated from `tinker==0.18.1` by
 `tests/tinker_client_compat.py`.
 
 The test discovers public Tinker client methods with `dir()` and `inspect`,
-starts the real Open-RL FastAPI gateway in single-process mode with a tiny
+starts the real OpenRL FastAPI gateway in single-process mode with a tiny
 local model fixture, lets the SDK fetch server bootstrap config, calls each
 discovered method
 with small fixture arguments, and records whether the call succeeds before
diff --git a/examples/README.md b/examples/README.md
@@ -1,6 +1,6 @@
-# Open-RL Examples
+# OpenRL Examples
 
-This directory contains examples, demos, and helper scripts for using the Open-RL framework. These are not part of the core library but serve as recipes for training and evaluation.
+This directory contains examples, demos, and helper scripts for using the OpenRL framework. These are not part of the core library but serve as recipes for training and evaluation.
 
 ## Prerequisites
 
@@ -22,9 +22,11 @@ This directory contains examples, demos, and helper scripts for using the Open-R
 
 ### Reinforcement Learning (RL)
 * **[Text-to-SQL RL](rl/text-to-sql):** Runs the Gemma 4 SFT+RL recipe with SQL execution rewards and curve plotting.
-* **[Autoresearch Demo](autoresearch):** Runs code-RL researchers against the same Open-RL gateway using cookbook DeepCoder rewards, Sandbox Fusion, and optional Agent Sandbox CRDs.
+
+### Autoresearch
+* **[Autoresearch Demo](autoresearch):** Runs code-RL researchers against the same OpenRL gateway using cookbook DeepCoder rewards, Sandbox Fusion, and optional Agent Sandbox CRDs.
 
 ### Tinker Cookbook
-* **[Tinker Cookbook Recipes](tinker-cookbook):** Examples showing how to run [Tinker Cookbook](https://github.com/thinking-machines-lab/tinker-cookbook) recipes with Open-RL.
+* **[Tinker Cookbook Recipes](tinker-cookbook):** Examples showing how to run [Tinker Cookbook](https://github.com/thinking-machines-lab/tinker-cookbook) recipes with OpenRL.
 
 ---
diff --git a/examples/autoresearch/README.md b/examples/autoresearch/README.md
@@ -1,11 +1,11 @@
-# Open-RL Autoresearch Demo
+# OpenRL Autoresearch Demo
 
 This adapts [Karpathy's autoresearch](https://github.com/karpathy/autoresearch)
-to Open-RL: an agent repeatedly edits one allowed target, runs a bounded
+to OpenRL: an agent repeatedly edits one allowed target, runs a bounded
 measured attempt, keeps commits that improve the configured metric, and resets
 the rest. The same recipe contract works locally or in Kubernetes; in a cluster,
 each run can live in its own pod and act as a researcher while sharing the same
-storage and Open-RL backend.
+storage and OpenRL backend.
 
 ## Minimal Recipe Shape
 
@@ -64,10 +64,10 @@ recipe-specific settings.
 
 ## Architecture
 
-Autoresearch runs as a small Kubernetes add-on around the shared Open-RL
+Autoresearch runs as a small Kubernetes add-on around the shared OpenRL
 infrastructure. A recipe overlay starts the UI plus one researcher Sandbox per
 researcher. Each Sandbox runs Gemini CLI, edits the recipe, launches attempts,
-and calls the shared Open-RL/Tinker services.
+and calls the shared OpenRL/Tinker services.
 
 ![Autoresearch architecture](arch.png)
 
@@ -86,11 +86,11 @@ Choose one recipe overlay:
 # Fast text-SQL, no model server.
 kubectl apply -k examples/autoresearch/recipes/text_sql
 
-# Math-RL add-on. First deploy Open-RL with docs/setup/gke-setup.md,
+# Math-RL add-on. First deploy OpenRL with docs/setup/gke-setup.md,
 # or reuse an existing backend at http://open-rl-gateway-service:8000.
 kubectl apply -k examples/autoresearch/recipes/math_rl
 
-# Convenience one-shot Math-RL stack: Open-RL backend + autoresearch add-on.
+# Convenience one-shot Math-RL stack: OpenRL backend + autoresearch add-on.
 kubectl apply -k examples/autoresearch/recipes/math_rl/gke
 ```
 
@@ -110,7 +110,7 @@ http://localhost:8080/experiments.html
 ```
 
 Use the normal [GKE setup guide](../../docs/setup/gke-setup.md) for cluster,
-GPU, storage, and the Open-RL backend. These overlays add researcher sandboxes and
+GPU, storage, and the OpenRL backend. These overlays add researcher sandboxes and
 the UI on top of that shared backend.
 
 Researcher pods wait for comma-separated `READY_URLS` before the agent starts, so
diff --git a/examples/autoresearch/recipes/math_rl/README.md b/examples/autoresearch/recipes/math_rl/README.md
@@ -1,10 +1,10 @@
 # Math-RL Autoresearch Recipe
 
-This recipe is the Open-RL/Tinker analogue of
+This recipe is the OpenRL/Tinker analogue of
 [vivekvkashyap/autoresearch-rl](https://github.com/vivekvkashyap/autoresearch-rl).
 This recipe uses the same minimal recipe contract as text-SQL, with a different
 TOML command. The agent edits one file, `config.toml`; `autoresearch.toml`
-declares this recipe's fixed Open-RL/Tinker command.
+declares this recipe's fixed OpenRL/Tinker command.
 
 ```toml
 command = "python -m recipes.math_rl.train config=recipes/math_rl/config.toml run_dir={run_dir} run_name={run_name} base_url=$TINKER_BASE_URL attempt_timeout_minutes={attempt_timeout_minutes}"
@@ -17,7 +17,7 @@ command declared in TOML as long as it writes the configured metric to
 `metrics.jsonl`.
 
 Unlike the original prime-rl setup, this recipe does not allocate two GPUs per
-researcher. Researcher pods call a shared Open-RL gateway via `TINKER_BASE_URL`;
+researcher. Researcher pods call a shared OpenRL gateway via `TINKER_BASE_URL`;
 the cluster-side model/trainer stack owns GPU placement. The composed GKE stack
 sets the shared `BASE_MODEL` to `Qwen/Qwen2.5-0.5B-Instruct`, matching the
 `autoresearch-rl` base model.
@@ -35,7 +35,7 @@ flow.
 
 ## Local Attempt Run
 
-From `examples`, with an Open-RL gateway reachable on a port:
+From `examples`, with an OpenRL gateway reachable on a port:
 
 ```bash
 export TINKER_BASE_URL=http://127.0.0.1:9003
@@ -67,7 +67,7 @@ uv run python -m run_attempt \
 ## Kubernetes Run
 
 Use the normal [GKE setup guide](../../../../docs/setup/gke-setup.md) to deploy
-Open-RL, or reuse an existing backend. Then add the autoresearch researchers and
+OpenRL, or reuse an existing backend. Then add the autoresearch researchers and
 UI:
 
 ```bash
@@ -76,7 +76,7 @@ kubectl port-forward svc/open-rl-autoresearch-ui 8080:8080
 ```
 
 For a single-command demo, use the convenience overlay that composes the normal
-Open-RL backend with the autoresearch add-on:
+OpenRL backend with the autoresearch add-on:
 
 ```bash
 kubectl apply -k examples/autoresearch/recipes/math_rl/gke
diff --git a/examples/autoresearch/recipes/math_rl/program.md b/examples/autoresearch/recipes/math_rl/program.md
@@ -1,14 +1,14 @@
-# Open-RL Math-RL Autoresearch Program
+# OpenRL Math-RL Autoresearch Program
 
 You are an autonomous researcher running inside an isolated sandbox. Your job is
-to improve an Open-RL/Tinker RL run by editing only
+to improve an OpenRL/Tinker RL run by editing only
 `recipes/math_rl/config.toml`.
 
 This mirrors the `vivekvkashyap/autoresearch-rl` loop: the human owns these
 instructions, and the agent iterates on one training configuration until its
 agent timeout expires.
 
-You do not manage GPUs directly. Run against the shared Open-RL gateway exposed
+You do not manage GPUs directly. Run against the shared OpenRL gateway exposed
 by `TINKER_BASE_URL`; the model/trainer service handles GPU placement.
 
 ## Setup
diff --git a/examples/autoresearch/recipes/text_sql/README.md b/examples/autoresearch/recipes/text_sql/README.md
@@ -10,7 +10,7 @@ editable = ["recipes/text_sql/train.py"]
 metric = "accuracy"
 ```
 
-This recipe samples the configured base model through the shared Open-RL gateway
+This recipe samples the configured base model through the shared OpenRL gateway
 for both the unmodified default-config attempt and later agent-edited attempts. `prepare.py` owns the fixed
 dataset and scoring helpers; `train.py` is the editable runnable attempt.
 
diff --git a/examples/autoresearch/recipes/text_sql/program.md b/examples/autoresearch/recipes/text_sql/program.md
@@ -1,4 +1,4 @@
-# Open-RL Text-to-SQL Autoresearch Program
+# OpenRL Text-to-SQL Autoresearch Program
 
 You are an autonomous researcher running inside an isolated sandbox. Your job is
 to improve held-out text-to-SQL execution accuracy by editing
diff --git a/examples/sft/pig-latin/README.md b/examples/sft/pig-latin/README.md
@@ -14,13 +14,13 @@ This script demonstrates fine-tuning a model to translate English into Pig Latin
 ## Running the Training Server
 
 ### Option 1: Qwen (Default)
-Start the local single-process Open-RL server for Qwen (`BASE_MODEL` defaults to `Qwen/Qwen3-0.6B`):
+Start the local single-process OpenRL server for Qwen (`BASE_MODEL` defaults to `Qwen/Qwen3-0.6B`):
 ```bash
 make server
 ```
 
 ### Option 2: Gemma
-Start the local single-process Open-RL server for Gemma (set `BASE_MODEL`):
+Start the local single-process OpenRL server for Gemma (set `BASE_MODEL`):
 ```bash
 make server BASE_MODEL=google/gemma-3-1b-it
 ```
diff --git a/examples/sft/text-to-sql/README.md b/examples/sft/text-to-sql/README.md
@@ -1,7 +1,7 @@
 # Gemma 3 Text-to-SQL SFT
 
 This directory contains the Gemma 3 supervised fine-tuning example for
-Text-to-SQL tasks using Open-RL. The Gemma 4 SFT+RL recipe lives under
+Text-to-SQL tasks using OpenRL. The Gemma 4 SFT+RL recipe lives under
 [`../../rl/text-to-sql`](../../rl/text-to-sql).
 
 ## Prerequisites
@@ -15,7 +15,7 @@ Text-to-SQL tasks using Open-RL. The Gemma 4 SFT+RL recipe lives under
 
 ## Running the Training Server
 
-Start the local single-process Open-RL server:
+Start the local single-process OpenRL server:
 ```bash
 make server BASE_MODEL=google/gemma-3-1b-pt
 ```
diff --git a/examples/text-to-sql/README.md b/examples/text-to-sql/README.md
@@ -7,10 +7,10 @@ This recipe provides a complete guide to fine-tuning a base LLM model to generat
 - **Base Model**: [google/gemma-4-E2B](https://huggingface.co/google/gemma-4-E2B)
 - **Dataset**: [philschmid/gretel-synthetic-text-to-sql](https://huggingface.co/datasets/philschmid/gretel-synthetic-text-to-sql)
 
-The **goal** is to demonstrate how to use the Open-RL infrastructure to run training locally on a single machine with multiple GPUs. This provides a baseline and understanding before scaling to a distributed Kubernetes (K8s) cluster in later guides.
+The **goal** is to demonstrate how to use the OpenRL infrastructure to run training locally on a single machine with multiple GPUs. This provides a baseline and understanding before scaling to a distributed Kubernetes (K8s) cluster in later guides.
 
 **What the core script does**: The core training script [texttosql_sft_grpo.py](texttosql_sft_grpo.py) orchestrates the training loop. It performs the following actions:
-*   Calls our Open-RL server (gateway) to request samples from vLLM.
+*   Calls our OpenRL server (gateway) to request samples from vLLM.
 *   Executes the generated SQL queries in a local SQLite database to compute rewards.
 *   Sends these rewards back to the server to update the LoRA adapter weights via the trainer.
 
@@ -51,7 +51,7 @@ sequenceDiagram
 
 ## Setup
 
-Before running the training, you need to set up the environment and deploy Open-RL. You can choose to run it locally on a VM with multiple GPUs or on a GKE cluster.
+Before running the training, you need to set up the environment and deploy OpenRL. You can choose to run it locally on a VM with multiple GPUs or on a GKE cluster.
 
 *   For **Local Setup** (recommended for baseline), follow the [Local Setup Guide](../../docs/setup/local-setup.md).
 *   For **GKE Setup** (recommended for scaling), follow the [GKE Setup Guide](../../docs/setup/gke-setup.md).
@@ -68,7 +68,7 @@ Open a **third terminal session** to run the training script.
 You can copy and paste these into your training terminal before proceeding.
 
 ```bash
-# Open-RL Gateway URL
+# OpenRL Gateway URL
 export TINKER_BASE_URL=http://127.0.0.1:9003
 
 # Dummy API key for local gateway
diff --git a/examples/tinker-cookbook/README.md b/examples/tinker-cookbook/README.md
@@ -1,8 +1,8 @@
 # Tinker Cookbook Recipes
 
-Since Open-RL implements Tinker-compatible APIs, you can use
+Since OpenRL implements Tinker-compatible APIs, you can use
 [`tinker-cookbook`](https://github.com/thinking-machines-lab/tinker-cookbook)
-recipes with Open-RL endpoints.
+recipes with OpenRL endpoints.
 
 ## Setup
 
@@ -17,7 +17,7 @@ If you want to try other recipes, you may need to install other extras or depend
 
 ## Start the Server
 
-From the repository root, start one vLLM sampler and one Open-RL gateway on
+From the repository root, start one vLLM sampler and one OpenRL gateway on
 separate GPUs. These examples are written for two L4 GPUs or better.
 
 ```bash
@@ -42,7 +42,7 @@ use GPU/vLLM.
 
 ## Checkpointing Limitation
 
-Open-RL does not yet implement full Tinker-compatible durable checkpoint management. For recipes that expose periodic checkpoint saves, set `save_every=0`; See [gke-labs/open-rl#83](https://github.com/gke-labs/open-rl/issues/83) for more details.
+OpenRL does not yet implement full Tinker-compatible durable checkpoint management. For recipes that expose periodic checkpoint saves, set `save_every=0`; See [gke-labs/open-rl#83](https://github.com/gke-labs/open-rl/issues/83) for more details.
 
 ## Supervised Learning Loop