[RFC]: Support LMCache-Ascend Backend in SGLang

This RFC proposes integrating LMCache-Ascend into SGLang to optimize KV cache management on Huawei Ascend hardware.

To ensure stability and a quick initial rollout, this proposal focuses on the Local CPU backend implementation as the first phase. This will enable efficient KV cache offloading from NPU HBM to Host Memory (CPU RAM), significantly improving throughput and reducing latency for long-context workloads by enabling cache reuse.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC]: Support LMCache-Ascend Backend in SGLang #158

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC]: Support LMCache-Ascend Backend in SGLang #158

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions