[cleanup] Remove FSDP1 support + make 'fsdp' default to fsdp2 by erictang000 · Pull Request #1659 · NovaSky-AI/SkyRL

erictang000 · 2026-05-13T01:11:26Z

Summary

Removes the legacy FSDP1 backend and renames FSDP2 → FSDP, leaving a single FSDP strategy backed by PyTorch's composable fully_shard API. trainer.strategy="fsdp2" is kept as a deprecated alias that emits a DeprecationWarning and normalizes to "fsdp", so existing user scripts and YAMLs continue to work.

Motivation: FSDP2 was already the default everywhere, and the SFT path already rejected FSDP1. The dual-backend code carried a lot of dead weight — branching in FSDPStrategy._fsdp_init_model, an fsdp_version() dispatcher, parallel offload_fsdp_* / offload_fsdp2_* helpers, FSDP1-only LoRA prefixes, three
_handle.reshard(True) workarounds in the worker, and ~14 parametrized tests that ran twice (doubling the CI matrix for no gain).

Changes

Core code (`skyrl/backends/skyrl_train/`)

distributed/fsdp_utils.py — Deleted fsdp_version(), get_fsdp_state_ctx(), offload_fsdp_model_to_cpu(), load_fsdp_model_to_gpu(), get_sharding_strategy(), and get_fsdp_wrap_policy(). Removed FSDP1 imports (FullyShardedDataParallel, _lazy_init). Simplified layered_summon_lora_params() and
collect_lora_params() to FSDP2-only paths (no more summon_full_params, no more _fsdp_wrapped_module prefixes).
distributed/fsdp_strategy.py — Deleted the if self.fsdp_strategy == "fsdp": FSDP1 init branch and the MixedPrecision / CPUOffload imports. Replaced get_fsdp_state_ctx(...) callsites with direct state_dict calls (FSDP2 returns DTensors natively). _unwrap_model no longer needs the FSDP1 _fsdp_wrapped_module path.
save_hf_model now unconditionally uses fsdp2_get_full_state_dict.
workers/fsdp/fsdp_worker.py — Removed three _handle.reshard(True) FSDP1-internal workarounds, two FSDP.set_state_dict_type(...) calls in FSDPWeightExtractor, and the now-unused FSDP1 imports. Strategy assertion tightened to == "fsdp".

The fsdp2_*-prefixed helpers (apply_fsdp2, fsdp2_load_full_state_dict, fsdp2_get_full_state_dict, fsdp2_clip_grad_norm_, offload_fsdp2_model_to_cpu, load_fsdp2_model_to_gpu) are intentionally kept — their names map directly to the PyTorch torch.distributed.fsdp.fully_shard API surface they wrap.

Strategy normalization & deprecation alias

validate_cfg() and validate_sft_cfg() now normalize strategy="fsdp2" → "fsdp" with a DeprecationWarning before any downstream validation runs.
Removed the FSDP1-only cpu_offload assertion in validate_cfg().
FSDPBackendOverrides.strategy default and the backend assertion list flipped from "fsdp2" to "fsdp".

Configs & defaults

TrainerConfig.strategy: "fsdp2" → "fsdp"
ppo_base_config.yaml: strategy: fsdp2 → strategy: fsdp (also dropped # fsdp2 only qualifier on reshard_after_forward)
sft_config.py: _VALID_STRATEGIES = ("megatron", "fsdp")
examples/train/gsm8k/gsm8k-grpo-skypilot.yaml: same flip

Tests

Deleted tests/backends/skyrl_train/gpu/gpu_ci/distributed/test_fsdp_strategy.py (only contained test_fsdp1_wrap_policy) and the now-empty directory.
Updated 14 parametrized tests in tests/backends/skyrl_train/gpu/: dropped FSDP1 ("fsdp" rows in the old scheme), renamed "fsdp2" rows to "fsdp", updated test IDs.
Updated the import_worker() test helper.
Bulk-renamed strategy = "fsdp2" assignments and trainer.strategy=fsdp2 overrides across ~10 test files and ~30 example shell/Python scripts.
Deleted examples/train/training_backends/fsdp/run_fsdp2.sh (now a duplicate of run_fsdp.sh).
Added TestFSDP2StrategyAlias::test_fsdp2_normalized_to_fsdp_with_warning in tests/train/test_sft_config.py to lock in the deprecation alias behavior.

Documentation

docs/content/docs/examples/training_backends.mdx: collapsed the "FSDP and FSDP2" section into a single "FSDP" section.
docs/content/docs/configuration/config.mdx: "We support three backends: FSDP1, FSDP2, and Megatron" → "two backends: FSDP and Megatron". Kept a "(formerly known as FSDP2)" pointer for searchability.
Renamed FSDP2 → FSDP in docs/content/docs/{examples/megatron,recipes/overview,tinker/*}.mdx and in trainer.strategy=fsdp2 snippets across all tutorial/example pages.
skyrl-train/README.md: "Training Backends: FSDP, FSDP2, and Megatron" → "FSDP and Megatron".
examples/train/sft/README.md: backend description updated.

Out of scope

The [project.optional-dependencies] fsdp = [...] extras group in pyproject.toml keeps its name (already correctly aligned with the canonical strategy).
File names fsdp_utils.py, fsdp_strategy.py, fsdp_worker.py and the fsdp_config Hydra group are unchanged — they were already correct.
No Megatron changes.

Test plan

uv run --extra dev --extra skyrl-train python -m pytest tests/train/test_sft_config.py tests/train/test_trainer.py -v — 20 passed (incl. new alias test)
Programmatic check: default TrainerConfig.strategy == "fsdp"; strategy="fsdp2" triggers DeprecationWarning in both validate_cfg and validate_sft_cfg
All 7 touched modules import cleanly
ruff check clean on every modified source file
grep -rn "fsdp_version\|FSDP1\|get_fsdp_state_ctx\|get_sharding_strategy\|offload_fsdp_model_to_cpu\|load_fsdp_model_to_gpu" returns no matches in skyrl/, tests/, examples/, docs/
GPU CI: gpu_ci_run_skyrl_train.sh (parametrized tests now run only the FSDP path, not duplicated)
Smoke train: bash examples/train/gsm8k/run_gsm8k.sh trainer.strategy=fsdp
Alias smoke: bash examples/train/gsm8k/run_gsm8k.sh trainer.strategy=fsdp2 — should warn and run
Docs build (Vercel preview)

Breaking changes / migration

trainer.strategy="fsdp" now means what "fsdp2" used to mean. There is no migration for users on FSDP2 — their configs work unchanged in spirit; if they had strategy=fsdp2 literally pinned, it still works (with a deprecation warning) and resolves to FSDP. Users who explicitly relied on FSDP1 will see different behavior and
should review the FSDP2 cpu_offload / reshard_after_forward semantics in the updated config docs.

🤖 Generated with Claude Code

gemini-code-assist

Code Review

This pull request consolidates the FSDP backends by removing the legacy FSDP1 implementation and renaming the FSDP2 (composable fully_shard API) strategy to "fsdp". The changes include extensive updates to documentation, example scripts, and configuration files to reflect the new naming convention, along with the addition of deprecation warnings for the "fsdp2" alias. Furthermore, the FSDPStrategy and associated utilities were refactored to remove FSDP1-specific logic. Review feedback highlighted potential issues with key matching and prefixing in the LoRA parameter collection logic, as well as a suggestion to update an error message for consistency with the new naming.

erictang000 · 2026-05-13T18:01:50Z

gpu ci: https://console.anyscale.com/jobs/prodjob_5nf14xk864tew4ugyysjj8lrv1

old inference ci: https://console.anyscale.com/jobs/prodjob_mx5ciyglpkauz3hch1urcshvqy

erictang000 added 3 commits May 11, 2026 18:44

x

38a75ca

Merge branch 'main' of https://github.com/erictang000/SkyRL

085ba3e

remove fsdp 1

c39f079

erictang000 added the run_train_gpu_ci label May 13, 2026

gemini-code-assist Bot reviewed May 13, 2026

View reviewed changes

Comment thread skyrl/backends/skyrl_train/distributed/fsdp_utils.py

Comment thread skyrl/backends/skyrl_train/distributed/fsdp_utils.py

Comment thread skyrl/backends/skyrl_train/distributed/fsdp_strategy.py

erictang000 changed the title ~~[cleanup] Remove FSDP1 support~~ [cleanup] Remove FSDP1 support + make 'fsdp' default to fsdp2 May 13, 2026

cleanup

b7db253

erictang000 added the run_train_old_inference_gpu_ci label May 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cleanup] Remove FSDP1 support + make 'fsdp' default to fsdp2#1659

[cleanup] Remove FSDP1 support + make 'fsdp' default to fsdp2#1659
erictang000 wants to merge 4 commits into
NovaSky-AI:mainfrom
erictang000:remove_fsdp1

erictang000 commented May 13, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

erictang000 commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

erictang000 commented May 13, 2026

Summary

Changes

Core code (skyrl/backends/skyrl_train/)

Strategy normalization & deprecation alias

Configs & defaults

Tests

Documentation

Out of scope

Test plan

Breaking changes / migration

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

erictang000 commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Core code (`skyrl/backends/skyrl_train/`)