Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
b8a55f3
[docs] Add Multi-LoRA Megatron Tinker design doc (v1)
erictang000 May 4, 2026
90c3ed4
[multi-lora] Add AdapterStore for per-worker LoRA slot bookkeeping
erictang000 May 4, 2026
7183d55
[multi-lora] Wire AdapterStore into MegatronPolicyWorkerBase
erictang000 May 4, 2026
abdadb1
[multi-lora] Add ensure_active_adapter + model_id threading to dispatch
erictang000 May 4, 2026
e4f2333
[multi-lora] Allow multiple LoRA policy adapters in SkyRLTrainBackend
erictang000 May 4, 2026
f762261
[multi-lora] Add GPU-gated multi-LoRA integration test for Megatron
erictang000 May 4, 2026
03d623a
[multi-lora] Add two-client smoke runbook
erictang000 May 4, 2026
057a627
[multi-lora] Fix _lora_signature_from to not read non-existent target…
erictang000 May 4, 2026
2fcea45
x
erictang000 May 4, 2026
003d3ee
[multi-lora] Swap grad buffers along with params + optimizer state
erictang000 May 4, 2026
e5309f4
Merge remote-tracking branch 'origin/main' into multi_lora
erictang000 May 7, 2026
2a3a236
[multi-lora] Remove internal-development docs from PR
erictang000 May 7, 2026
76dc375
[multi-lora] Fix integration test: backend name, healthcheck, Tinker …
erictang000 May 7, 2026
43f7d65
[multi-lora] Restore v1 sampling guards + add SEQ-vs-ALT min repro test
erictang000 May 7, 2026
aca96d0
[multi-lora] AdapterStore: snapshot/restore optimizer.param_groups[g]…
erictang000 May 7, 2026
ddb87c8
[multi-lora] Tighten SEQ-vs-ALT test to bit-exact + add Qwen3-0.6B va…
erictang000 May 8, 2026
24ca9c7
[multi-lora] Drop Qwen3-0.6B variant; tiny-model bit-exact is sufficient
erictang000 May 8, 2026
fe008ea
[multi-lora] Drop SEQ-vs-ALT comment references from in-tree code
erictang000 May 8, 2026
9b374fa
x
erictang000 May 8, 2026
824a840
[multi-lora][ci] Move test to tests/tinker/skyrl_train + add GPU CI
erictang000 May 8, 2026
c2d27e5
[multi-lora][ci] Rename CI to tinker-skyrl-train-backend-gpu
erictang000 May 8, 2026
e1f3c31
[ci] Remove accidentally-tracked .claude/scheduled_tasks.lock
erictang000 May 8, 2026
28775d6
x
erictang000 May 8, 2026
6a45214
[multi-lora] Code review cleanup
erictang000 May 8, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 72 additions & 0 deletions .github/workflows/tinker_skyrl_train_backend_gpu.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
name: Tinker-SkyRL-Train-Backend-GPU

on:
push:
branches:
- main
paths:
- 'ci/anyscale_tinker_skyrl_train_backend_gpu.yaml'
- 'ci/gpu_ci_run_tinker_skyrl_train_backend.sh'
- 'skyrl/backends/skyrl_train/workers/megatron/**'
- 'skyrl/backends/skyrl_train/workers/worker_dispatch.py'
- 'skyrl/backends/skyrl_train_backend.py'
- 'skyrl/tinker/**'
- 'tests/tinker/skyrl_train/**'
- 'pyproject.toml'
- '!docs/**'
- '!examples/**'
- '.github/workflows/tinker_skyrl_train_backend_gpu.yaml'
pull_request_target:
types: [labeled]
workflow_dispatch:


permissions:
checks: write # for status checks to appear
contents: read

jobs:

tinker_skyrl_train_backend_gpu_tests:
if: >
github.event_name == 'push' ||
github.event_name == 'workflow_dispatch' ||
(
github.event_name == 'pull_request_target' &&
!github.event.pull_request.draft &&
contains(github.event.pull_request.labels.*.name, 'run_tinker_skyrl_train_backend_gpu_ci') &&
(
github.event.pull_request.author_association == 'MEMBER' ||
github.event.pull_request.author_association == 'OWNER' ||
github.event.pull_request.author_association == 'COLLABORATOR'
)
)
runs-on: ubuntu-latest
defaults:
run:
shell: bash
working-directory: .

steps:
- uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha || github.ref }}
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.12'
cache: 'pip'
- name: Install the latest version of uv
uses: astral-sh/setup-uv@v6
with:
activate-environment: true
- name: Install basic dependencies
run: uv pip install anyscale==0.24.79 typer==0.9.0
# Run tests
- name: GPU tests
env:
ANYSCALE_CLI_TOKEN: ${{ secrets.ANYSCALE_CLI_TOKEN }}
ANYSCALE_HOST: https://console.anyscale.com
run: |
anyscale job submit -f ci/anyscale_tinker_skyrl_train_backend_gpu.yaml --timeout 5000
anyscale job wait --cloud sky-anyscale-aws-us-east-1 --name tinker-skyrl-train-backend-gpu --timeout 5000
8 changes: 8 additions & 0 deletions ci/anyscale_tinker_skyrl_train_backend_gpu.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
name: tinker-skyrl-train-backend-gpu
entrypoint: bash ci/gpu_ci_run_tinker_skyrl_train_backend.sh
image_uri: novaskyai/skyrl-train-ray-2.51.1-py3.12-cu12.8-megatron
cloud: sky-anyscale-aws-us-east-1
ray_version: "2.51.1"
compute_config: l4_ci
working_dir: .
max_retries: 0
11 changes: 11 additions & 0 deletions ci/gpu_ci_run_tinker_skyrl_train_backend.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/usr/bin/env bash
set -xeuo pipefail

export CI=true

# End-to-end multi-LoRA tests: spin up a real Tinker API server backed by
# SkyRL-Train Megatron and exercise per-adapter swap, signature gating,
# v1 single-tenant sample guard, per-adapter Adam step isolation, and
# delete-then-train continuity.
uv run --directory . --isolated --extra tinker --extra megatron --with pytest --with pytest-timeout \
pytest -s --timeout=600 tests/tinker/skyrl_train/test_multi_lora_megatron.py
Loading
Loading