Skip to content

[Bug] Performance regression on flashinfer.comm.trtllm_allreduce_fusion in 0.6.1 #2407

@Fridge003

Description

@Fridge003

Ref: sgl-project/sglang#17237

In 0.5.3 it was correct, but after upgrading to 0.6.1 flashinfer.comm.trtllm_allreduce_fusion becomes really slow.

In sglang it's called here (with use_oneshot=None):
https://github.com/sgl-project/sglang/blob/17349168bb159a4e68ec7ac071aac4b8f67e5c60/python/sglang/srt/layers/flashinfer_comm_fusion.py#L194

It can be easily reproduced on small batch sizes (e.g. num_tokens=32, hidden_dim=7168) num_tokens=1 under checking

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions