[tinker][megatron] Multi-LoRA Megatron + Tinker API RL Training by erictang000 · Pull Request #1621 · NovaSky-AI/SkyRL

erictang000 · 2026-05-04T23:30:52Z

Summary

Multi-tenant LoRA training and per-adapter sampling for the SkyRL-Train Megatron + FSDP backends, exposed via the Tinker API. Multiple rl_loop clients can run concurrently against a single Tinker server, each owning their own LoRA adapter; the backend swaps the live adapter into Megatron's optimizer/buffer state on demand and registers per-tenant adapters on vLLM by model_id.

What's in the diff

Six files, ~238 LOC net.

Backend / dispatch (skyrl/backends/skyrl_train_backend.py, workers/worker_dispatch.py):

Drop the v1 single-tenant gates in sample() and save_sampler_checkpoint(). The batched-sample path now validates every model_id in a mixed batch is a known policy, and resolves the inference-engine model name per request (model_id for LoRA tenants, resolve_policy_model_name(cfg) as the FFT / single-tenant fallback). The engine's find_batchable_sample legitimately mixes adapters in a single sample call now that multi-LoRA is live.
WorkerDispatch.save_weights_for_sampler takes an optional model_id. Before broadcasting it calls ensure_active_adapter("policy", model_id) to swap the requested adapter into the optimizer/model on every worker — without this we'd export some other tenant's LoRA weights to vLLM. model_id is forwarded into _broadcast_to_inference_engines so the worker registers the adapter on vLLM under that name.
model_id=None everywhere preserves the legacy single-tenant code path.

Per-tenant LoRA save+load (workers/megatron/megatron_worker.py, workers/fsdp/fsdp_worker.py):

Both _save_lora_adapters_and_sync and broadcast_to_inference_engines take an optional lora_name / model_id. With model_id set the adapter writes into a per-tenant subdir of lora_sync_path and registers on vLLM under that name (via RemoteInferenceClient.load_lora_adapter). With model_id=None the legacy SKYRL_LORA_ADAPTER_NAME shared-path behavior is unchanged.
Megatron path additionally requires merge_lora=False so vLLM serves each tenant's adapter by name rather than baking it into the base weights.

Config knobs (skyrl/train/config/ppo_base_config.yaml):

New trainer.policy.model.lora.max_loras (default 1) — maps to vLLM's max_loras, the number of adapters that can be active in a single GPU batch.
New trainer.policy.model.lora.max_cpu_loras (default null → vLLM defaults to max_loras) — the total LoRA capacity in vLLM's CPU LRU cache.

Tests (tests/tinker/skyrl_train/test_multi_lora_megatron.py):

Existing train-only tests (test_two_adapters_train_independently, test_per_adapter_step_isolation, test_rank_mismatch_rejected, test_delete_then_train_remaining) continue to pass under the per-tenant config (merge_lora=False + max_loras=4).
Removed the train-only test_sample_with_two_adapters_errors — the gate it covered is gone.
Added test_per_adapter_sample_isolation: two pristine adapters following an identical training trajectory must produce bit-identical greedy samples through vLLM at every step. Catches: vLLM serving the wrong adapter for a model_id, load_lora_adapter("B", ...) clobbering adapter A's slot, optimizer-state regressions that surface as different greedy tokens.
Added test_two_adapters_sample_independently: A trains, B trains in between (registering its own LoRA on vLLM), A trains one more step. Asserts (1) A's tokens advance after the resumed step (A's optimizer state survived B's intervention end-to-end through sampling) and (2) A's final tokens differ from B's (B's adapter sync did not clobber A's slot).
New _sample_greedy(tc, name, tok, prompt) helper: save_weights_and_get_sampling_client(name=...) then a deterministic temperature=0 sample.

Operator contract

To run multi-tenant LoRA on Megatron, set:

trainer:
  policy:
    megatron_config:
      lora_config:
        merge_lora: false
    model:
      lora:
        max_loras: <max concurrent adapters>
        max_cpu_loras: <total adapter capacity>

max_cpu_loras should be ≥ the number of distinct adapters you expect to serve concurrently, otherwise vLLM evicts an adapter mid-run and the next sample 404s. There is no auto-rehydrate yet — operators size for the workload. The FFT and single-tenant LoRA paths are unchanged; existing configs need no modification.

Test plan

pytest tests/tinker/skyrl_train/test_multi_lora_megatron.py (6 tests, GPU-gated, requires 2 GPUs for Megatron policy + vLLM)
All existing single-tenant test_api.py paths
End-to-end smoke: two rl_loop clients with distinct run_names sharing one Tinker server; verify both converge on their respective tasks without contamination

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

Made-with: Cursor # Conflicts: # tests/backends/skyrl_train/gpu/utils.py

Adds the design write-up for multi-tenant LoRA training on the Megatron backend exposed via the Tinker API. v1 is training-only; sampling and adapter-only checkpoint export are deferred. Implementation follows on the multi_lora branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

New module holding per-adapter pinned-CPU snapshots of the LoRA bucket params + DistributedOptimizer fp32-main + Adam state on each Megatron PolicyWorker. swap_to() walks mc.buffers + expert_parallel_buffers and shard_fp32_from_float16_groups, doing tensor.copy_() in both directions under torch.no_grad with dp_group barriers + cuda stream syncs. Also includes a sanity check that every trainable param under DDP buffers is a LoRA adapter param (named "...adapter..."), so a future regression that unfreezes a non-LoRA param fails loudly at registration rather than silently corrupting state. Wiring into PolicyWorker / WorkerDispatch / SkyRLTrainBackend follows in subsequent commits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds an `adapter_store: AdapterStore | None` attribute on the policy worker (allocated only when LoRA is active so the FFT path is unchanged) plus five Ray-callable methods: - prime_optimizer_state — calls Megatron's DistributedOptimizer._init_optimizer_states_with_dummy_values() so exp_avg/exp_avg_sq exist before we snapshot the pristine slot. - register_pristine_adapter — derives a LoraSignature from the worker's own lora config + parallel state, snapshots live state into pristine. - register_adapter(model_id) — allocates a fresh slot; first call uses live as the slot, subsequent calls seed from pristine. - delete_adapter(model_id) — drops a slot. - swap_to_adapter(model_id) — local tensor.copy_() between live and slot storages plus dp_group barriers. Plus an adapter_store_state() diagnostic for tests. Orchestration from the controller follows in subsequent commits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

WorkerDispatch now exposes: - ensure_active_adapter(role, model_id): fans swap_to_adapter to all actors of `role`. No-op when model_id is None or the workers don't own an AdapterStore (FFT path). - prime_adapter_store(role, model_id): one-shot bootstrap for the very first create_model — primes optimizer state, registers pristine slot, registers the first adapter in one Ray-fanout sequence. - register_adapter / delete_adapter: per-call slot maintenance. forward / forward_backward / forward_backward_from_staged / optim_step / set_lr / save_checkpoint / load_checkpoint take an optional model_id and call ensure_active_adapter after _ensure_on_gpu. Default None preserves single-tenant behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

create_model now allows additional 'policy' models when LoRA is active and the first policy model has been built. Subsequent calls validate (rank, alpha, target_modules) match the first adapter's signature, then register a new slot via WorkerDispatch.register_adapter. FFT (rank=0) keeps the original single-tenant gate. _build_policy takes the first model_id and, when LoRA is active, fires the AdapterStore bootstrap (prime_optimizer_state + register_pristine_adapter + register_adapter) on every worker before the colocate_all offload while model + optimizer are still GPU-resident. delete_model: when more than one model is registered and the role is a LoRA policy, just drop the slot via dispatch.delete_adapter and pop the controller-side maps. Last-adapter delete still does the full ray.shutdown teardown so the runtime can be rebuilt cleanly. Plumbed model_id through forward / forward_backward / optim_step / set_lr / save_checkpoint / load_checkpoint dispatch calls so the active adapter is swapped in on every per-model entry point. sample() and save_sampler_checkpoint() refuse with a clear error when more than one LoRA adapter is registered (v1 inference path is single- tenant; per-adapter sampling is deferred). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

End-to-end test that starts a Tinker API server with the SkyRL-Train Megatron backend and exercises: - two LoRA adapters training independently without weight contamination, - rank-mismatch on a second create_model raises a clear error, - sample()/save_sampler_checkpoint with two adapters raises (v1 scope), - delete_model on one adapter leaves the runtime alive and the other adapter still trainable. Auto-skips when no CUDA device is visible. Server lifecycle uses the same wait_for_condition pattern as test_api.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Manual smoke test (the gate before merging multi_lora): launch a Tinker API server with the SkyRL-Train Megatron backend, run two tinker-cookbook sl_loop clients in parallel against it with distinct model_ids, and verify - the policy model is built once (no second `init policy model done`), - the second client triggers `Registered additional LoRA adapter`, - both clients converge on their respective NLLs without weight contamination, - GPU memory stays bounded as the second client connects, - rank-mismatch / two-adapter sample / single-adapter-delete behave per the v1 contract. Plus troubleshooting notes for the common failure modes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…_modules Tinker's public LoraConfig (skyrl/tinker/types.py:66) exposes only rank + alpha + seed + train_{attn,mlp,unembed}; it has no target_modules attribute. The Megatron path reads target_modules from the server-side cfg.trainer.policy.model.lora.target_modules, which is fixed at startup, so multi-adapter signature equality reduces to (rank, alpha). The worker-side AdapterStore still verifies parallel state equality via its own LoraSignature. Fixes the AttributeError on the first create_model in the smoke test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Fixes a cross-tenant grad-corruption race surfaced in review: Tick N: batched fwd_bwd = [A.fb, B.fb] - sub-batch A: swap_to("A"), zero_grad_buffer, accumulate A's grads - sub-batch B: swap_to("B") <-- only params + opt state swapped zero_grad_buffer <-- A's grads CLOBBERED here accumulate B's grads Tick N+1: A.optim_step - swap_to("A") restores A's params + opt state - optimizer.step() reads grad_data, which holds B's grads -> B's gradient is applied to A's weights, A's actual gradient is lost The fix is to snapshot/restore `mc.buffers[i].grad_data` (and `expert_parallel_buffers`) alongside `param_data`. AdapterSlot now carries a parallel cpu_grad_data list; _allocate_empty_slot, _snapshot, _restore, and _copy_slot all maintain it. The fp32 grad accumulator inside DistributedOptimizer.step() is short-lived (created and consumed within one call) so it doesn't need slot storage. Memory cost: ~+1x per slot for the grad mirror (bf16, same size as param buffer). For a 7B base + rank-32 LoRA on a single DP shard this is on the order of tens of MB, dwarfed by the existing fp32 main + Adam moments. Updates the design doc to reflect the four storages per LoRA param and adds a "Why grads must travel with the slot" section walking through the race the review caught. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Per-tenant adapter routing for the merge_lora=False Megatron + vLLM path (and FSDP for parity). The Tinker model_id IS the vLLM adapter name end-to-end. Worker side (megatron_worker, fsdp_worker): - broadcast_to_inference_engines accepts model_id (Optional[str]). - When LoRA is active, save the adapter into a per-tenant subdir os.path.join(lora_sync_path, model_id) so concurrent saves don't collide, and call load_lora_adapter(model_id, path, load_inplace=True) on vLLM. model_id=None preserves the legacy single-tenant SKYRL_LORA_ADAPTER_NAME path. - _save_lora_adapters_and_sync takes a lora_name parameter (default SKYRL_LORA_ADAPTER_NAME) instead of hardcoding the singleton. Dispatch side (worker_dispatch): - save_weights_for_sampler(model_id=None) calls ensure_active_adapter(policy, model_id) before broadcasting so the correct adapter is live, and forwards model_id to broadcast_to_inference_engines. Backend side (skyrl_train_backend): - save_sampler_checkpoint passes model_id (when LoRA is active). - sample() per-request `model` field is now the request's model_id when it's a registered LoRA adapter, falling back to resolve_policy_model_name(cfg) for FFT / single-tenant. - Drop the v1 'raise if >1 adapter' guards on sample / save_sampler_ checkpoint — multi-tenant sampling is the goal of this branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…apter Both _load_on_server and _unload_on_server called await resp.json() without try/except, so a non-JSON error body (e.g. a plain-text 5xx from a proxy in front of vLLM) would raise a generic JSON-parse error and lose the original status. Mirror the robust pattern from _post: try resp.json(content_type=None), fall back to resp.text() on parse failure, then raise_for_status with whichever body we got. Addresses the gemini-code-assist review note on PR NovaSky-AI#1579 (see NovaSky-AI#1579). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Updates docs/content/docs/tinker/multi_lora_design.mdx scope from "training only" to "training + per-adapter sampling", documents the Tinker model_id == vLLM adapter name contract, the per-tenant lora_sync_path layout, the merge_lora=False requirement on Megatron, and the operator's max_cpu_loras sizing contract. Adds a "PR NovaSky-AI#1579 foundation" section pointing at the upstream PR. Adds tests/tinker/test_multi_lora_rl_two_clients.md as the manual gate: two rl_loop clients training and sampling on independent adapters against one server, plus contamination check, negative checks, and troubleshooting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request implements multi-tenant LoRA training and sampling for the Megatron backend, enabling multiple adapters to share a single base model. It introduces an AdapterStore to manage swapping LoRA weights and optimizer states (including gradients and Adam moments) between GPU and pinned CPU memory. The RemoteInferenceClient and various training components have been updated to require explicit model identification for routing generation requests to specific adapters. Additionally, configuration for concurrent LoRA capacity in vLLM is now supported. Feedback suggests using more specific exception types for API-level validation errors in the inference client.

gemini-code-assist · 2026-05-04T23:33:44Z

+    if not model:
+        raise ValueError(
+            f"RemoteInferenceClient.{method_name}: request body must include a "
+            f"non-empty 'model' field identifying the target base model or "
+            f"LoRA adapter."
+        )


The _require_model function raises a ValueError if the model is missing. While this is good for validation, consider if a more specific exception type or a custom error class would be more appropriate for API-level validation errors to distinguish them from internal logic errors.

# Conflicts: # skyrl/backends/skyrl_train/inference_servers/remote_inference_client.py # skyrl/backends/skyrl_train/workers/worker_dispatch.py # skyrl/backends/skyrl_train_backend.py # skyrl/train/entrypoints/main_base.py # tests/backends/skyrl_train/gpu/utils.py

The Tinker engine batches sample requests across model_ids in find_batchable_sample, then dispatches one prepared_batch to backend.sample(). Our previous "exactly one model_id per batch" guard short-circuited multi-tenant RL — when both rl_loop clients had sample() requests pending in the same engine tick, the batched call hit the guard and returned 400 to both. Replaces the unique-model check with a per-request validation: every model_id must be a known policy, but multiple distinct policy model_ids in one batch are fine. Routing per request is already handled by _sample_with_remote_client via the per-request `model` field on the data plane. Co-Authored-By: Eric Tang <erictang000@gmail.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The merged unload-then-load workaround from main wasn't sufficient for re-syncing an adapter — vLLM still returned 400 "adapter already loaded" on the second sync of the same name (e.g. when a tenant calls save_sampler_checkpoint a second time, or when the unload step returns 200 but the cached LoRARequest hasn't been evicted yet). vLLM's own error message instructs the caller to set load_inplace=True in that case, which is what PR NovaSky-AI#1579 originally did. Restore that behavior: thread the load_inplace parameter (default True, exposed on the public API) into the /v1/load_lora_adapter payload, drop the separate _unload_on_server pre-step. The standalone unload_lora_adapter method still exists for callers that explicitly want eviction. Fixes the rl_loop runtime error: ClientResponseError: 400, message="The lora adapter '<id>' has already been loaded. If you want to load the adapter in place, set 'load_inplace' to True." Co-Authored-By: Eric Tang <erictang000@gmail.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ter" This reverts commit edefc84.

Adds docs/content/docs/tinker/async_sample_routing.mdx describing the plan to route SkyRL-Train sample requests through the existing EXTERNAL fan-out path (api.py:1039-1064) instead of the engine's synchronous loop. The engine already excludes EXTERNAL futures from its scheduler; we just need to point a new BackendForwardingInferenceClient at the engine-managed vLLM. Covers: synchronization invariants (I1-I4) that already hold via the SDK + checkpoint validation + vLLM pause/resume, files to add/modify (EngineStateDB row, BackendForwardingInferenceClient, SkyRLTrainBackend._publish_engine_state, api.py lifespan wiring), trade-offs vs. dual-loop-in-engine and full async refactors, failure modes, testing plan, and explicit non-goals (training-side parallelism, auto-recovery from vLLM eviction). Co-Authored-By: Eric Tang <erictang000@gmail.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Captures /tmp/rl_loop_{a..i}.log and /tmp/sl_loop_{a..d}.log from the multi-LoRA RL + SFT smoke runs into tests/tinker/smoke_logs/. Each run contributes code.diff (the working-tree diff at launch), config.json, logs.log (full stdout/stderr), and metrics.jsonl. Force-added because *.log is in the project .gitignore. ~1.2 MB total; useful as reference output for the runbooks at tests/tinker/test_multi_lora_{rl_two_clients,smoke_two_clients}.md. Co-Authored-By: Eric Tang <erictang000@gmail.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

# Conflicts: # skyrl/backends/skyrl_train/inference_servers/remote_inference_client.py # skyrl/backends/skyrl_train/inference_servers/setup.py # skyrl/backends/skyrl_train/workers/fsdp/fsdp_worker.py # skyrl/backends/skyrl_train/workers/megatron/megatron_worker.py # skyrl/backends/skyrl_train_backend.py # skyrl/train/generators/skyrl_gym_generator.py # skyrl/train/generators/skyrl_vlm_generator.py # tests/backends/skyrl_train/gpu/gpu_ci/inference_servers/test_multi_lora_serving.py # tests/backends/skyrl_train/inference_servers/test_remote_inference_client.py

…version PR NovaSky-AI#1579's test files evolved between when our branch cherry-picked them and when the PR was merged to main (model field is now optional via _resolve_model). Reset them to main's version to keep the multi_lora_rl diff strictly to RL functionality.

The 13 run dirs (~1.2 MB) added in 57a474a are still recoverable from git history; remove them from the working tree so the PR diff stays focused on multi-LoRA RL code.

Move the design docs and smoke-test runbooks to a separate skyrl-tinker-dev repo (internal development workspace), and drop the .python-version pin so it doesn't leak into contributors' local repos. Files removed: - docs/content/docs/tinker/multi_lora_design.mdx - docs/content/docs/tinker/async_sample_routing.mdx - tests/tinker/test_multi_lora_rl_two_clients.md - tests/tinker/test_multi_lora_smoke_two_clients.md - .python-version The integration test (tests/tinker/test_multi_lora_megatron.py) stays in this repo since it's a real pytest test, not a runbook.

Two fixes: 1. Restore the v1 single-tenant sampling guards in skyrl_train_backend.py that the merge from origin/main accidentally dropped: - sample() returns ErrorResponse when LoRA is active and >1 adapter is registered. - save_sampler_checkpoint raises ValueError under the same condition. Multi-tenant inference is the RL follow-up (NovaSky-AI#1621); SFT v1 must refuse it explicitly rather than silently corrupting state. test_sample_with_two_adapters_errors had been passing in earlier runs only by accident — restore the actual guarantee. 2. Add test_seq_vs_alt_per_adapter_step_isolation: min repro of the SEQ-vs-ALT divergence flagged in ~/skyrl-seq-vs-alt-repro (against Qwen3-4B + PPO on a real pod). Two fresh adapters, ALT-style sequence, identical data, asserts pre-update losses match within 1e-2 at every step. With AdapterStore snapshotting state['step'] per slot, this passes on the tiny model — step 0 is bit-exact, step 1 diverges by 1.7e-4 (three orders of magnitude below the user's Qwen3-4B observation). If a future change leaks a global step counter across adapters, this test will fail loudly; the assertion message points at the SEQ-vs-ALT diagnosis. Local: 5/5 pass in ~2m on 1x B200.

Adds multi-tenant LoRA training to the SkyRL-Train Megatron Tinker backend. Multiple Tinker clients can `create_model` against the same server, each with its own LoRA adapter sharing the base model. ## Architecture - New `AdapterStore` (per `PolicyWorker`) holds a per-adapter pinned-CPU snapshot of the LoRA bucket params, `DistributedOptimizer` fp32-main, Adam state (`exp_avg`, `exp_avg_sq`), and grad buffer. A swap is `tensor.copy_()` between live GPU storage and a slot, bracketed by `dp_group` barriers + cuda stream syncs. - `WorkerDispatch.ensure_active_adapter(role, model_id)` fans the swap RPC to all policy actors. Every per-model dispatch entry (`forward_backward`, `optim_step`, `set_lr`, `save_checkpoint`, `load_checkpoint`) takes `model_id` and swaps before executing. ## API surface - `create_model` allows additional policy `model_id`s when LoRA is active. First call builds the policy + bootstraps the AdapterStore (`prime_optimizer_state` → `register_pristine_adapter` → `register_adapter`); subsequent calls validate `(rank, alpha)` against the pristine signature and register a new slot. - `delete_model` only does `ray.shutdown()` on the *last* adapter; otherwise it's just a slot drop. - `sample()` and `save_sampler_checkpoint` raise when more than one adapter is registered — multi-tenant *inference* lands in the RL follow-up (#1621). ## Files - New: `skyrl/backends/skyrl_train/workers/megatron/adapter_store.py`, `tests/tinker/test_multi_lora_megatron.py` (GPU-gated integration test). - Modified: `megatron_worker.py`, `worker_dispatch.py`, `skyrl_train_backend.py`. ## Verification - New integration test: two adapters trained in alternation, weights checked for cross-contamination, rank-mismatch rejection, `delete`-then-train continuity. - Manual smoke (two `sl_loop` clients) — runbook lives in [`erictang000/skyrl-tinker-dev`](https://github.com/erictang000/skyrl-tinker-dev). Design doc: same repo, `design/multi_lora_design.mdx`. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

# Conflicts: # skyrl/backends/skyrl_train/workers/megatron/adapter_store.py # skyrl/backends/skyrl_train/workers/megatron/megatron_worker.py # skyrl/backends/skyrl_train/workers/worker_dispatch.py # skyrl/backends/skyrl_train_backend.py

…in/ on main)

Two new GPU-gated tests that exercise the RL-side weight-sync path (save_weights_for_sampler -> RemoteInferenceClient.load_lora_adapter under model_id -> sample(model=model_id) on vLLM): - test_per_adapter_sample_isolation: analog of test_per_adapter_step_isolation. Two pristine adapters following an identical training trajectory must produce bit-exact greedy samples at every step. Catches per-tenant routing failures (vLLM serves the wrong adapter), load_lora_adapter slot collisions, and per-param- group `step` snapshot regressions. - test_two_adapters_sample_independently: analog of test_two_adapters_train_independently. After A trains, B trains, A trains again — A's continued sample must differ from A's earlier sample (A continues to learn) and from B's sample (B's sync did not clobber A's adapter on vLLM). Required server config changes (folded into BACKEND_CONFIG, harmless for the existing train-only tests): - merge_lora=False so vLLM serves per-tenant LoRA adapters by name instead of pushing merged weights as a base-model update; - max_loras=4 / max_cpu_loras=4 so vLLM's CPU LRU holds all adapters the suite registers concurrently. Local: 6/6 pass on 2x B200 (~5 min wall clock for the full suite).

- Replace "RL contract" / "(see RL_BACKEND_CONFIG)" with phrasing that doesn't reference an external concept or non-existent constant. - Add the two new sampling tests to the module-level Coverage list. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

erictang000 · 2026-05-08T18:39:42Z

/gemini review

gemini-code-assist

Code Review

This pull request implements multi-tenant LoRA support for FSDP and Megatron backends, enabling concurrent training and sampling of multiple adapters. Changes include per-adapter synchronization paths, updated worker dispatch logic, and support for multi-adapter batches in the sampling backend. Feedback highlights a potential path traversal vulnerability when constructing sync paths from model_id and notes that the adapter name is missing in the weight update logic for non-remote inference clients.

Two reviewer concerns: 1. Path traversal in os.path.join(base_sync_path, model_id). model_id is server-generated (api.py validates against ID_PATTERN), so this is defense in depth, but route through os.path.basename in both Megatron and FSDP workers so a misformed id can't escape lora_sync_path. Also add _cleanup_lora_sync_subdir on per-adapter delete_model so the per-tenant subdirs don't accumulate as adapters churn. 2. Legacy update_named_weights path didn't carry the adapter name — vllm_engine generated a numeric name from time.time_ns(), making the adapter inaccessible by tenant model_id. Add lora_name to LoraLoadRequest, plumb through both BaseVLLMInferenceEngine variants (sync + async _load_lora_from_disk), and pass lora_name from both worker files in the legacy branch. Empty string preserves the pre-existing single-tenant behavior. All 6 existing multi-LoRA tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Followup to the previous review fix. Cleanup belongs where the path is constructed, not duplicated in the controller — the backend was mirroring the worker's path-resolution logic, which would silently break if the worker's path scheme changed. Factor _resolve_lora_sync_target(model_id) on MegatronPolicyWorkerBase and have both broadcast_to_inference_engines (the writer) and delete_adapter (the cleanup) use it. Single ownership of the path contract; rank-0 rmtree on delete; best-effort on failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Fix typo "Make the requested adapter is live" -> "Make the requested adapter live" + restore the why-line in worker_dispatch. - Drop the "Multi-LoRA RL" qualifier in skyrl_train_backend.sample's validation comment — multi-LoRA SFT can also legitimately mix model_ids in one batched sample call. - Tighten the multi-tenant comments in megatron_worker / fsdp_worker.broadcast_to_inference_engines (they were a bit verbose). - Hide the AdapterStore implementation detail in the multi_tenancy doc. - Use JSON `null` (not Python `None`) in the multi_tenancy doc's --backend-config example for trainer.logprobs_chunk_size. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

erictang000 · 2026-05-09T01:57:34Z

/gemini review

gemini-code-assist

Code Review

This pull request implements multi-tenant LoRA support for the Megatron backend in SkyRL, allowing multiple LoRA adapters to be trained and sampled concurrently against a single server. Key changes include updating the weight synchronization and inference engine logic to handle per-tenant adapter directories and vLLM registration, adding necessary configuration options, and updating documentation and tests to reflect the new multi-tenancy capabilities. My review identified two issues: the shutil.rmtree call in megatron_worker.py incorrectly uses ignore_errors=True while attempting to catch exceptions, and the error message in skyrl_train_backend.py incorrectly references 'fsdp' instead of 'fsdp2'.

hao-aaron and others added 15 commits April 28, 2026 01:21

x

3f20131

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

Merge remote-tracking branch 'upstream/main' into multi-lora

e39d8ea

Made-with: Cursor # Conflicts: # tests/backends/skyrl_train/gpu/utils.py

x

d4a0a04

gemini-code-assist Bot reviewed May 4, 2026

View reviewed changes

erictang000 and others added 11 commits May 4, 2026 23:45

Revert "[multi-lora-rl] Pass load_inplace=True to vLLM load_lora_adap…

fa3bfbc

…ter" This reverts commit edefc84.

x

cb38614

[multi-lora-rl] Drop smoke_logs/ from PR diff

243d8b5

The 13 run dirs (~1.2 MB) added in 57a474a are still recoverable from git history; remove them from the working tree so the PR diff stays focused on multi-LoRA RL code.

erictang000 mentioned this pull request May 7, 2026

[tinker][megatron] Multi-LoRA Megatron + Tinker API #1617

Merged

erictang000 and others added 5 commits May 8, 2026 06:41

[multi-lora-rl] Remove old test path (moved to tests/tinker/skyrl_tra…

36b4751

…in/ on main)

run lint

f42db90

gemini-code-assist Bot reviewed May 8, 2026

View reviewed changes

erictang000 and others added 6 commits May 8, 2026 18:58

clean up comments and nits

27a4ddc

add docs for multi-tenancy and update old docs mentioning limitations

e037176

add chunked logprobs none and plumb through ability to set it to none

a620667

gemini-code-assist Bot reviewed May 9, 2026

View reviewed changes

Comment thread skyrl/backends/skyrl_train/workers/megatron/megatron_worker.py

Comment thread skyrl/backends/skyrl_train_backend.py Outdated

erictang000 added 3 commits May 9, 2026 02:04

x

e2d783d

fix cpu test

c51b931

lint

d0e5d99

erictang000 merged commit e4648d4 into NovaSky-AI:main May 9, 2026
3 of 5 checks passed

erictang000 deleted the multi_lora_rl branch May 9, 2026 02:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tinker][megatron] Multi-LoRA Megatron + Tinker API RL Training#1621

[tinker][megatron] Multi-LoRA Megatron + Tinker API RL Training#1621
erictang000 merged 40 commits into
NovaSky-AI:mainfrom
erictang000:multi_lora_rl

erictang000 commented May 4, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 4, 2026

Uh oh!

erictang000 commented May 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

erictang000 commented May 9, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

erictang000 commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's in the diff

Operator contract

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

erictang000 commented May 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

erictang000 commented May 9, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

erictang000 commented May 4, 2026 •

edited

Loading