This RFC proposes integrating LMCache-Ascend into SGLang to optimize KV cache management on Huawei Ascend hardware.
To ensure stability and a quick initial rollout, this proposal focuses on the Local CPU backend implementation as the first phase. This will enable efficient KV cache offloading from NPU HBM to Host Memory (CPU RAM), significantly improving throughput and reducing latency for long-context workloads by enabling cache reuse.