sgl-kernel-npu

SGLang kernel library for NPU Contribution guide refer to Contribution Guide.

Quick start

DeepEP-Ascend: Ascend Implementation of DeepEP. README

SGL-Kernel-NPU: Other SGLang Kernels for Ascend NPU. README

DeepEP-Ascend Performance

Normal kernels with pure HCCS

We test normal kernels on A3 384 SuperPOD. And we follow the DeepSeek-V3/R1 pretraining setting (4096 tokens per batch, 7168 hidden, top-8 experts, INT8 dispatching and BF16 combining).

Type	Dispatch #EP	Bottleneck bandwidth	Combine #EP	Bottleneck bandwidth
Intranode	8	146 GB/s (HCCS)	8	125 GB/s (HCCS)
Intranode	16	107 GB/s (HCCS)	16	103 GB/s (HCCS)
Intranode	32	102 GB/s (HCCS)	32	95 GB/s (HCCS)
Intranode	64	81 GB/s (HCCS)	64	91 GB/s (HCCS)
Intranode	128	57 GB/s (HCCS)	128	81 GB/s (HCCS)

Low-latency kernels with pure HCCS

We test normal kernels on A3 384 SuperPOD. And we follow a typical DeepSeek-V3/R1 production setting (128 tokens per batch, 7168 hidden, top-8 experts, INT8 dispatching and BF16 combining).

Dispatch #EP	Latency	Bandwidth	Combine #EP	Latency	Bandwidth
8	132 us	58 GB/s (HCCS)	8	126 us	116 GB/s (HCCS)
16	139 us	55 GB/s (HCCS)	16	135 us	109 GB/s (HCCS)
32	153 us	49 GB/s (HCCS)	32	151 us	97 GB/s (HCCS)

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
.github/workflows		.github/workflows
benchmark		benchmark
cmake		cmake
contrib		contrib
csrc		csrc
docs		docs
include		include
python		python
scripts		scripts
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
build.sh		build.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

sgl-kernel-npu

Quick start

DeepEP-Ascend Performance

Normal kernels with pure HCCS

Low-latency kernels with pure HCCS

About

Uh oh!

Releases 8

Packages

Contributors 30

Languages

License

sgl-project/sgl-kernel-npu

Folders and files

Latest commit

History

Repository files navigation

sgl-kernel-npu

Quick start

DeepEP-Ascend Performance

Normal kernels with pure HCCS

Low-latency kernels with pure HCCS

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Contributors 30

Languages

Packages