Skip to content

chfeng-cs/vllm-contributions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Open Source Contributions — Ethan Feng (chfeng-cs)

Focused area: KV Cache Transfer · Scheduler Optimization

Core repos: vllm-project/vllm · sgl-project/sglang · flashinfer-ai/flashinfer


Contributions

Issue
Issue Title Status Impact
vllm#42846 [Bug][CI] NIXL + FlashInfer fails with Qwen3 MRV2 and --block-size 128 ☑️ Closed

Feature

PR Title Status Impact
vllm#42321 [KV Connector] Eager KV prefetch at request enqueue time in LMCacheMPConnector 🔄 Open ~25% TTFT reduction (benchmarked under high load with disk KV prefetch, L20)
vllm#41847 [KV Transfer] Enable HMA by default for connectors that support it ☑️ Merged Reduces user config burden; fixes MultiConnector gap vs PR #42045
flashinfer#3280 feat(norm): support weightless RMSNorm for FlashNorm weight folding (#3200) 🔄 Open

Metrics

PR Title Status Impact
vllm#42206 [Metrics] Add group-aware KV cache capacity to vllm:cache_config_info ☑️ Merged Add group-aware KV cache capacity Prometheus gauges

Bug Fix

PR Title Status Impact
vllm#44101 [LMCache] fix lookup lock leak when request is aborted before alloc 🔄 Open
vllm#44097 [LMCache] fix missing cache_salt in free_lookup_locks call 🔄 Open
vllm#42872 [Bugfix][Model Runner v2] Fix MRV2 KV cache kernel block sizing. ❌ Closed Closed: implemented by core maintainer
sglang#24434 [NemotronH] Fix expert scale weight loading ☑️ Merged

Docs

PR Title Status Impact
vllm#42160 [Docs] Fix broken local links ☑️ Merged
vllm#42077 [Docs] Update server entrypoint examples ☑️ Merged
vllm#42073 [Docs] Fix RLHF example links ☑️ Merged
vllm#42066 [Docs] Fix OpenAI batch model argument examples ☑️ Merged

Other

PR Title Status Impact
vllm#45497 [Core][KV Connector] Avoid hybrid KV load failure crash 🔄 Open
vllm#42214 [Test][Bugfix] Fix mypy error: missing enable_prompt_embeds arg in test_tp_sp_nvfp4_generation ❌ Closed Closed: duplicate
vllm#42086 [Core][KV Connector] Bounded early prefetch for waiting requests ❌ Closed Closed: first version of PR #42321, abandoned due to significant design differences
flashinfer#3273 docs: update contributing repository layout 🔄 Open

Last synced: 2026-06-18 06:46 UTC


Background

Brief context on the work: prefill-decode disaggregation requires efficient KV cache transfer between nodes. The PRs above address scheduler-level prefetch scheduling and hybrid KV cache manager (HMA) defaults to reduce latency and simplify configuration.

Related design notes in notes/.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages