🚀 The feature, motivation and pitch
We currently have two highly specialized low latency GEMMS that support PDL after AR
dsv3_a_gemm:
|
char const* env = std::getenv("TRTLLM_ENABLE_PDL"); |
dsv3_router_gemm:
|
const char* env = std::getenv("TRTLLM_ENABLE_PDL"); |
These:
- currently only support specific shapes
- only support bf16
We should create generalized versions of these low-latency PDL enabled GEMMS, which support:
- bf16
- fp8
- nvfp4
- arbitrary shapes
Alternatives
None
Additional context
PDL fix in flashinfer: flashinfer-ai/flashinfer#2887
Before submitting a new issue...
🚀 The feature, motivation and pitch
We currently have two highly specialized low latency GEMMS that support PDL after AR
dsv3_a_gemm:vllm/csrc/dsv3_fused_a_gemm.cu
Line 46 in 6241521
dsv3_router_gemm:vllm/csrc/moe/dsv3_router_gemm_utils.h
Line 38 in 6241521
These:
We should create generalized versions of these low-latency PDL enabled GEMMS, which support:
Alternatives
None
Additional context
PDL fix in flashinfer: flashinfer-ai/flashinfer#2887
Before submitting a new issue...