Release PaddleFleet version 0.1.0 · PaddlePaddle/PaddleFleet

The first version for PaddleFleet.

What's Changed

Add preliminary code by @From00 in #1
add spec utils by @FeixLiu in #2
add enums.py and identity_op.py by @GuoxiaWang in #3
add vpp_simulator by @Waynezee in #5
PaddleFleet distributed initialization and ProcessGroup create. by @Hz188 in #8
add codestyle workflow by @swgu98 in #6
Complete parallel_state.py by @Hz188 in #12
Trans transformer block/layer by @FeixLiu in #11
[CodeStyle] Ignore PLC0414 in __init__.py files by @SigureMo in #13
Complete process_groups_config.py doc and fix typo by @Hz188 in #15
[setup] Support source installation of Paddle-Fleet by @risemeup1 in #14
improve parallel_state.py for EPHcg & Hcg by @Hz188 in #16
[CI] Add approval workflow by @swgu98 in #10
support pipeline_parallel schedules by @AlAuAu in #9
dev global vars, yaml parse by @Hz188 in #18
[CI] fix approval workflow by @swgu98 in #20
Add Attention and RoPE classes by @lshpku in #17
add mtp by @FeixLiu in #22
unittest test_schedules bugfix by @AlAuAu in #24
add gpt_model specs by @GuoxiaWang in #23
add mlp layer by @blacksheep-Aristotle in #26
Trans ln by @FeixLiu in #25
Add PackedSeqParams class by @lshpku in #30
add psp by @FeixLiu in #31
Update LanguageModelEmbedding and add unittest by @lshpku in #27
Update RoPE and add unittest by @lshpku in #28
[CI] add test workflow by @swgu98 in #19
add pipeline/utils.py by @blacksheep-Aristotle in #33
[CI] change ut dir by @swgu98 in #34
[CI] git name by @swgu98 in #35
Placeholder for building transformer configuration parsing by @Hz188 in #32
[CI] Add typos white list by @swgu98 in #36
change sublayer to sublayer_spec by @FeixLiu in #39
fix some test_gpt_model dependencies by @GuoxiaWang in #40
Add global Timers for logging by @huangjiyi in #21
add check_initialized for dp group by @Hz188 in #41
add coverage scripts by @XieYunshen in #38
fix coverage scripts by @XieYunshen in #45
First successful run of GPTModel model definition by @GuoxiaWang in #44
Change fleet core to paddlefleet by @From00 in #46
fix coverage bug by @risemeup1 in #47
Some fixes to successful run glm4.5 in PaddleFormers by @From00 in #49
single card test by @swgu98 in #50
mv paddlefleet to src by @risemeup1 in #52
use paddle12.6 in single card test by @risemeup1 in #53
Add GPTModelEstimator by @huangjiyi in #59
Fix attention dim order by @lshpku in #60
fix ffn_hidden_size is None when init gpt_mlp by @blacksheep-Aristotle in #56
Add estimate_mfu by @huangjiyi in #62
refine pyproject.toml by @risemeup1 in #63
[CI] add uv pre-commit by @swgu98 in #65
[CI] Add nemo megatron approval by @swgu98 in #51
Add set_logging and get_logger by @huangjiyi in #64
bugfix init GPTModel by @GuoxiaWang in #54
[CodeStyle][Ruff] update ruff target-version to py310 by @ooooo-create in #66
Supprot custom op by @zhangbo9674 in #48
【lora】fix model for Lora by @xiaoguoguo626807 in #68
Ci/multi card config by @swgu98 in #67
[CI] add approval for model_parallel_config.py & transformer_config.py by @swgu98 in #70
[MoE] Add Base MoE Layer by @hushenwei2000 in #61
Delete sharded_state_dict to support FC save/load by @changeyoung98 in #71
Align to PaddleFormers by @Waynezee in #72
Ignore files generated from uv sync for custom ops by @ooooo-create in #69
[Bug_Fix] fix attention_mask & skip check expert_tensor_parallel_group by @xuxinyi389 in #73
Fix moe_layer config by @From00 in #74
Fix save_tensors bugs and disable jit by @From00 in #75
Add tensor parallel functions by @pkuzyc in #29
Support TP Sharding EP For GLM4.5 by @xuxinyi389 in #76
move spec utils to paddlefleet by @FeixLiu in #78
Cherry pp layers by @FeixLiu in #80
add non pipeline execution by @LiYuRio in #81
Shared weight test by @FeixLiu in #86
modify pylayer bug by @xiaoguoguo626807 in #87
refine non-pp scheduler by @LiYuRio in #89
Support MTP in GLM4.5 and add unittest by @lshpku in #55
Use original cross_entropy and re-open the loss check in unit test by @pkuzyc in #84
Fix rope dim order by @lshpku in #91
support PipelineParallel by @AlAuAu in #92
fix single card run by @huangjiyi in #90
Fix bug in tensor_parallel unit tests by @pkuzyc in #93
[MoE Layer] Fix EP Hang when No Tokens are Distributed in the Rank by @hushenwei2000 in #83
pp License fix by @AlAuAu in #95
[CI] add integration test glm by @swgu98 in #85
Add sharded_state_dict for TP by @changeyoung98 in #94
[CI] fix bypass by @swgu98 in #97
Add instructions for copilot reviewer by @risemeup1 in #96
[Feature] Add test instruction by @risemeup1 in #98
disable test_layers.py by @swgu98 in #99
[CI] Delete sed by @swgu98 in #101
rename config fields to align huggingface by @Hz188 in #82
Fix bias grad reduction of bias_geglu_back by @lshpku in #100
fix config by @Waynezee in #108
support pipeline_parallel_withinterleave by @AlAuAu in #102
[Feature] Add nightly wheel publishing workflow by @swgu98 in #107
[CI] Remove redundant AK/SK exports in nightly publish workflow by @swgu98 in #115
suppport PipelineParallelWithInterleaveFthenB and VPPFhenBInBalancedMemory by @AlAuAu in #113
turn off deepep on ampere and fix logging by @huangjiyi in #109
add llava_model and clip_vit model by @blacksheep-Aristotle in #105
support distributed_model by @AlAuAu in #111
fix deterministic by @Waynezee in #116
【modelconfig】Change model layer name to support hf model by @xiaoguoguo626807 in #118
support fp8 fusion node by @deepllz in #114
Move sdpa before kv broadcast by @lshpku in #121
Support fuse rope by @xuxinyi389 in #117
model_config_and_dpo_support. by @wtmlon in #106
Fix bugs in vocab_parallel_cross_entropy and VocabParallelEmbedding by @pkuzyc in #104
Change name 2 by @xiaoguoguo626807 in #122
Sequence parallel for GPTModel by @pkuzyc in #125
Refine custom ops compile by @zhangbo9674 in #126
add single card test and a100 test by @huangjiyi in #124
Use Abi3 for building whl by @risemeup1 in #128
Add setup test by @risemeup1 in #133
add config by @Waynezee in #120
add cp for paddlefleet by @Wennie396 in #129
add coverge by @tianlef in #131
Fix sharded_state_dict for single card by @changeyoung98 in #135
fix numel block cpu by @huangjiyi in #136
[CI] Add PR paddle wheel by @swgu98 in #137
[CI]fix_uv_sync by @tianlef in #138
Fix bugs in sequence parallel and add unit test by @pkuzyc in #139
[CI] Revert paddleformers commit for integration test by @swgu98 in #140
[Refactor] Split tokens_stable_unzip.cu into modular CUDA files by @ooooo-create in #141
【fused_moe】fix Moe fp8_utils.py bwd by @xiaoguoguo626807 in #142
support matmul_bwd by @xuxinyi389 in #134
Add dedicated FusedRMSNorm class by @lshpku in #147
[CI] Add customop approval in ci/check_approval.sh by @ooooo-create in #145
【fp8】expert weight stop gradient = True can't apply_backward_hook by @xiaoguoguo626807 in #149
[Pipeline Parallel] support pipeline parallel for gpt model by @LiYuRio in #112
[CI] glm45 a100 by @swgu98 in #154
[CI] add flags by @swgu98 in #155
Support DeepEPTopKRouter by @xuxinyi389 in #146
Gpt pp ut by @FeixLiu in #156
[CI] Add qwen precision & Update CI by @swgu98 in #162
[CI] Add version for wheel by @swgu98 in #163
【model name】update ppmodel state_dict name by @xiaoguoguo626807 in #160
[CI] single card test on h20 by @swgu98 in #167
GLM multi card test by @xuxinyi389 in #166
Support fuse_swiglu_scale by @xuxinyi389 in #164
add attn_mask_startend_row_indices for flashmask by @Wennie396 in #159
【config, pp】delete pipeline_dtype ; add model func by @xiaoguoguo626807 in #169
Clean some useless code by @ooooo-create in #150
[CI] Update config name by @swgu98 in #174
[MoE Layer] Add BF16 GroupedGEMM and Unit Tests by @hushenwei2000 in #127
[2025-12-11-17:21] Bump uv.lock by @ooooo-create in #173
fix cp bugs and add unit test for context parallel by @Wennie396 in #144
Precision Change by @Waynezee in #184
Add recompute by @Waynezee in #178
add fp8_dispatch && shared_expert_overlap && offline quant by @Waynezee in #158
Fix DeepEPTopKRouter for sp by @From00 in #186
Support GLM45 with pipeline parallel by @LiYuRio in #168
Move paddlefleet.extensions.ops to paddlefleet.ops by @ooooo-create in #176
[CI] Add Merge PR to test branch to Approval workflow and fix known-first-party in pyproject.toml by @ooooo-create in #190
[CI] add rerun workflow by @ooooo-create in #180
[CI]incremental coverage by @tianlef in #157
cache cos and sin for rope by @huangjiyi in #153
[CI]change loss by @tianlef in #194
[DeepGEMM] Support DeepGEMM as a submodule by @ooooo-create in #191
add empty layer by @FeixLiu in #189
[Compat] Add triton to torch_proxy scope by @ooooo-create in #201
Update .github/actions/check-bypass/action.yml by @ooooo-create in #202
[DeepGEMM] Fix deep_gemm install by @ooooo-create in #203
[CI] change to cli by @swgu98 in #198
add_recompute_modules by @Waynezee in #196
[CI]find error for log by @tianlef in #200
[3rdparty] add check for uninitialized submodules by @ooooo-create in #204
bug fix for moe by @FeixLiu in #199
Revert "[CI]find error for log" by @swgu98 in #210
fix by @swgu98 in #208
[CI]a100 case add: gated_linear_unit: true by @tianlef in #212
[CI]fix ci config for cli by @tianlef in #214
[Infra] Add instructions for faster local dev and remove cpplint, clang-format local hooks by @ooooo-create in #187
【Lora】fix lora pylayer bug by @xiaoguoguo626807 in #220
增加增量覆盖率信息打印 by @XieYunshen in #193
[Pipeline Parallel] NoPipelineParallel bugfix by @AlAuAu in #197
[CI] add sft+lora by @swgu98 in #216
fix recompute by @Waynezee in #221
Bump uv.lock by @ooooo-create in #177
[CI] Add new workflow to auto update uv.lock by @ooooo-create in #183
[CI] add moe_router_force_load_balancing by @swgu98 in #228
[DeepEP] Add DeepEP as a submodule by @ooooo-create in #215
[BugFix] Fix update_dependencies.yml with limited disk space by @ooooo-create in #233
[CI] Add reopened activity to trigger pull_request event in Approval.yml by @ooooo-create in #236
[CI]fix config for pretrain memory error by @tianlef in #231
add dict feature in function eval_batch & rename empty layer config by @Hz188 in #222
[CI]change loss by @tianlef in #238
[CI]change config by @tianlef in #244
[Compat] Refine paddle.compat.enable_torch_proxy usage by @ooooo-create in #243
[CI] deal exit code 250 by @tianlef in #209
update precision by @swgu98 in #245
【】delete Random warning only print once by @xiaoguoguo626807 in #247
support fused_swiglu_bwd by @xuxinyi389 in #239
pp model support dpo. by @wtmlon in #181
[CI]fix exit code of pt log file by @tianlef in #249
[MoE Layer] Add Grouped GEMM Fused Expert Weights Version by @hushenwei2000 in #175
unify subbatch by @xuxinyi389 in #240
[CI] add release3.3 paddle by @swgu98 in #255
[CI] add release3.3 single card by @swgu98 in #256
[CI] change shell to formers by @swgu98 in #258
[bugfix] fix pp empty layer config bug by @Hz188 in #259
Formalize deep_gemm unittests by @A-nnonymous in #250
fix lora bug by @xiaoguoguo626807 in #261
Support rrattnention in flashmask by @LLSGYN in #227
fix_recompute_fused_rope by @huangjiyi in #264
Fix loss diff for distributed strategies by @changeyoung98 in #254
open fusion of swiglu by @xuxinyi389 in #251
TopKRouter by @xuxinyi389 in #260
Reduce GLM memory consumption by @zhangting2020 in #266
[CI] del nemo megatron by @swgu98 in #275
[CI] add qwen3moe by @swgu98 in #273
[CI]Add glm dpo && coverage change by @tianlef in #274
[CI] Grouped GEMM Intergrated Test by @hushenwei2000 in #277
fix flash_mask_cp by @Wennie396 in #219
[BugFix] Add nvidia-nvshmem-cu12 limit to avoid multiple definitions by @ooooo-create in #285
[MoE Layer] Implement barrier_ep for Synchronization by @hushenwei2000 in #272
fix cp fused_rope by @Wennie396 in #278
Fix TransToDataType dtype cast error by @sneaxiy in #290
chore 🤖: Bump uv.lock (2026-01-04) by @github-actions[bot] in #291
bug fix by @FeixLiu in #288
Add sharded_state_dict for group_gemm by @changeyoung98 in #279
remove unuse operations and disable sequence_parallel when tp <= 1 by @Waynezee in #289
[3rdparty][DeepEP] Bump DeepEP by @ooooo-create in #299
[CI] single card unittest use uv build by @swgu98 in #296
[3rdparty][DeepEP] Bump DeepEP by @ooooo-create in #300
[CI] precision test by @swgu98 in #295
[MoE Layer] Fix Deep GEMM k_group Kernel Calling by @hushenwei2000 in #305
[CI] install dependences of paddlefleet with cache by @swgu98 in #306
[Sonicmoe] Add Sonicmoe as a submodule by @ooooo-create in #287
[CI]Fix exit code check logit for multi card unit test by @tianlef in #303
use uv build --wheel by @ooooo-create in #317
chore 🤖: Bump uv.lock (2026-01-06) by @github-actions[bot] in #313
align config by @Waynezee in #304
fix cp unittest by @Wennie396 in #307
Add check_patchelf_exists and bump sonic-moe by @ooooo-create in #326
fix seq_aux_loss by @xuxinyi389 in #318
[CI] update precision method by @swgu98 in #315
[MoE Layer] Fix Router topk_weigtht in noaux_tc Method by @hushenwei2000 in #329
[Feature] Add dynamic CUDA version-based dependency resolution by @ooooo-create in #293
[CI]add cpu compile by @tianlef in #328
[CI] coverage change to release by @swgu98 in #334
[CI]disable multi card by @tianlef in #335
tokens_unzip_gather support ue8m0 by @DanielSun11 in #310
[CI] coverage by @swgu98 in #336
Qwen3 vl by @blacksheep-Aristotle in #323
[Build] Add git hash by @ooooo-create in #333
[CI]fix coverage by @tianlef in #340
[Build] Remove .o files from wheel before packaging by @ooooo-create in #330
[fix]GLM45 pretrain fp8 on cuda126 by @tianlef in #342
[MoE Layer] Support deepgemm Padding to tile_M by @hushenwei2000 in #282
fix ut by @Waynezee in #347
[CI] nightly multi python by @swgu98 in #344
fix pname miss in grouped moe by @liufengwei0103 in #325
fix rope bug by @blacksheep-Aristotle in #338
[CI] add cancel by @swgu98 in #349
disable fp8 and deepep when cuda12.6 by @risemeup1 in #345
[MoE Layer] Delete moe_deep_gemm Config by @hushenwei2000 in #312
Fix bug for tokens_unzip_gather_kernel by @DanielSun11 in #341
fix router precision by @xuxinyi389 in #348
Fix the bug for MultiModalRope when mbs>1 by @pkuzyc in #351
Fix tensor model parallel world size return logic by @XieYunshen in #353
bump sonic-moe by @ooooo-create in #355
[CE]ADD CE by @tianlef in #316
[CI] paddle release tag by @swgu98 in #352
Fix the bug when get cp rank and size in rope by @pkuzyc in #358
fix layer_norm bug by @blacksheep-Aristotle in #350
fix seq_aux_loss by @Wennie396 in #361
[Recompute] adapt rr and support dict in selective recompute by @Waynezee in #294
【moe】add moe_fuse config only lora use by @xiaoguoguo626807 in #366
Fix the mis-match name bug of gelu_pytorch_tanh act by @pkuzyc in #363
[CI]fix coverage by @tianlef in #369
[DeepEP] Switch to paddlefleet.ops.deep_ep by @ooooo-create in #301
[CI] add timeout by @swgu98 in #380
support glm vpp overlap by @LiYuRio in #234
[ThirdParty] Bump sonic-moe version to reduce launch triton kernel overhead by @SigureMo in #381
[CE]add multi version python pipe by @tianlef in #357
[MoE Layer] Default use Paddle batched_gemm when enable moe_grouped_gemm by @hushenwei2000 in #370
fix_rr_rules by @Waynezee in #383
[MoE Layer] Add moe_ep_barrier configuration by @hushenwei2000 in #373
[MoE Layer] Fix AllToAll Implementation when TP > 1 by @hushenwei2000 in #360
Revert "[DeepEP] Switch to paddlefleet.ops.deep_ep" by @XieYunshen in #382
add high_precision_rope by @blacksheep-Aristotle in #377
fix_rope and seq_aux_loss by @Waynezee in #376
Update Paddle dependency version by @swgu98 in #387
[CI] Update grouped_gemm Unit Test for CUDA13 by @hushenwei2000 in #388
修改qwen3vl mrope计算逻辑 by @qhpeklh5959 in #379
[CE]Sonic moe by @tianlef in #386
add manual by @swgu98 in #391
manual wheel update by @swgu98 in #392
adapter sonic_moe by @xingmingyyj in #365
[CherryPick] fix rope in cp by @Waynezee in #398
[ThirdParty] Bump sonic-moe version to patch paddle.empty to support distributed env (#402) by @SigureMo in #403
fix by @swgu98 in #410
[cherry-pick] fix NoPipelineParallel init by @huangjiyi in #421
[cherry-pick][Docs] update CONTRIBUTING.md by @ooooo-create in #428

New Contributors

@From00 made their first contribution in #1
@GuoxiaWang made their first contribution in #3
@Hz188 made their first contribution in #8
@risemeup1 made their first contribution in #14
@lshpku made their first contribution in #17
@blacksheep-Aristotle made their first contribution in #26
@XieYunshen made their first contribution in #38
@zhangbo9674 made their first contribution in #48
@xiaoguoguo626807 made their first contribution in #68
@hushenwei2000 made their first contribution in #61
@changeyoung98 made their first contribution in #71
@xuxinyi389 made their first contribution in #73
@pkuzyc made their first contribution in #29
@LiYuRio made their first contribution in #81
@deepllz made their first contribution in #114
@wtmlon made their first contribution in #106
@Wennie396 made their first contribution in #129
@A-nnonymous made their first contribution in #250
@LLSGYN made their first contribution in #227
@zhangting2020 made their first contribution in #266
@sneaxiy made their first contribution in #290
@github-actions[bot] made their first contribution in #291
@DanielSun11 made their first contribution in #310
@liufengwei0103 made their first contribution in #325
@qhpeklh5959 made their first contribution in #379

Full Changelog: https://github.com/PaddlePaddle/PaddleFleet/commits/v0.1.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PaddleFleet version 0.1.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

New Contributors

Contributors

Uh oh!