Commit bc7d46c
Rebase 4_6_post_4 to master_next (sgl-project#47)
* Use device_id in dist init to reduce NCCL communicator warmup & creation overhead (sgl-project#5728)
* [fix] fix potential bumpy throughtput with deepgemm (sgl-project#5722)
* Resolves the `404 Not Found` error when running `compile_deep_gemm.py` in multi-node setups (sgl-project#5720)
* perf: update H20 fused_moe_triton kernel config to get higher throughput during prefilling (sgl-project#5716)
* we fix the non existent access of `decrypted_config_file` (sgl-project#5685)
* CI: rewrite test_vision_chunked_prefill to speedup (sgl-project#5682)
* Fuse MLA set kv cache kernel (sgl-project#5748)
* Update amd docker image to `sglang:v0.4.5.post3-rocm630`. (sgl-project#5697)
* [feature] support for roberta embedding models (sgl-project#5730)
* [fix] fix bench_one_batch_server (sgl-project#5607)
* support for the DeepSeek model by enabling streaming response parsing (sgl-project#5592)
* fix: Use `is not None` instead of `!= None` for None checks. (sgl-project#5687)
* Add Llama 4 to FA3 test (sgl-project#5509)
* [misc] more decode step log for batch_one_batch (sgl-project#5565)
* Handle JSONDecodeError while processing request data (sgl-project#5599)
* fix(srt): check if sample_indices is not None before usage. (sgl-project#5633)
* update llguidance to 0.7.11; adds StructTag (sgl-project#4870)
* Use sgl-kernel sgl_per_token_group_quant_int8 (sgl-project#4971)
* Add memory_saver check (sgl-project#4986)
Signed-off-by: Kebe <[email protected]>
* add switch to disable open api doc (sgl-project#3744)
Signed-off-by: congcongke <[email protected]>
* Revert "fix: import vllm_rotary_embedding error when head_size not in 64, 128, 256, 512" (sgl-project#5772)
* Fix eagle test case (sgl-project#5776)
* Split local attention test from fa3 test (sgl-project#5774)
* Revert "Revert "fix: import vllm_rotary_embedding error when head_size not in 64, 128, 256, 512"" (sgl-project#5777)
* Simplify FA3 tests (sgl-project#5779)
* Revert "[fix] fix bench_one_batch_server" (sgl-project#5785)
* Revert "Use device_id in dist init to reduce NCCL communicator warmup & creation overhead" (sgl-project#5786)
* [CI] Tune threshold (sgl-project#5787)
* [CI] fix port conflicts (sgl-project#5789)
* [CI] Fix ci tests (sgl-project#5769)
* [PD]Reduce kv transfer threads (sgl-project#5791)
* [CI] Fix test case (sgl-project#5790)
* Add 8-GPU Test for Deepseek-V3 (sgl-project#5691)
Co-authored-by: Lianmin Zheng <[email protected]>
* Release v0.4.6 (sgl-project#5795)
* Update nightly-test.yml (sgl-project#5797)
* [CI] Improve github summary & enable fa3 for more models (sgl-project#5796)
* [Docs] update grafana setup guide in production metrics (sgl-project#5643)
Co-authored-by: NoahM <[email protected]>
* [Misc] add structure logging, write to file and log tracing for SGL Router
* Improve overlap scheduling (sgl-project#5788)
* Add Cutlass MLA attention backend (sgl-project#5390)
* chore: upgrade sgl-kernel 0.1.0 (sgl-project#5690)
* Dockerfile.dev pip scikit_build_core (sgl-project#5807)
* Add a doc to fix sgl-kernel build link error in py39 with ccache (sgl-project#5809)
* Turn on overlap scheduler for multimodal models (sgl-project#5771)
* Tiny refactor DefaultModelLoader.Source (sgl-project#5482)
* [Docs] Replace lists with tables for cleanup and readability in server_arguments (sgl-project#5276)
* Revert "Tiny refactor DefaultModelLoader.Source" (sgl-project#5825)
* Feat: add support for thinking mode via chat_template_kwargs.enable_t… (sgl-project#5551)
Co-authored-by: shuaills <[email protected]>
Co-authored-by: Chayenne <[email protected]>
Co-authored-by: Lianmin Zheng <[email protected]>
Co-authored-by: Yineng Zhang <[email protected]>
* fix: fix the error where the content is None when reasoning and tool … (sgl-project#5838)
* feat: Add fused moe triton config for qwen3 moe on h100 (sgl-project#5833)
* fused moe triton tuning script support qwen3 (sgl-project#5842)
* feat: Add fused moe triton config for qwen3bf16 moe on h20 (sgl-project#5839)
* [PD] support pd fake transfer for warmup (sgl-project#5726)
* [config] qwen3moe_tune_h20 fp8 tp4 (sgl-project#5846)
* [Doc] Recover history of server_arguments.md (sgl-project#5851)
* feat: Add fused moe triton config for qwen3-30b-fp8 moe on h20 (sgl-project#5850)
* [CI] test chunked prefill more (sgl-project#5798)
* ROCm: update AITER (sgl-project#5816)
* [Feat] QWen-1M context support[1/2]: Update block sparse attention backend utils kernel (sgl-project#5847)
Co-authored-by: sighingnow <[email protected]>
* [Fix] Missing bootstrap_port field (sgl-project#5823)
* feat: update is_fa3_default_architecture (sgl-project#5854)
* add fused moe config for qwen3moe fp8/bf16 (sgl-project#5849)
* chore: bump v0.4.6.post1 (sgl-project#5845)
* Support `max_completion_tokens` for OpenAIChatCompletions (sgl-project#5857)
* simplify fused_moe config logging (sgl-project#5801)
* [CI] tune the test order to warmup the server (sgl-project#5860)
* Cutlass MLA decode - fix dtype error (sgl-project#5868)
* cutlass 3.9 supported to improve fp8_blockwise_gemm (sgl-project#5820)
* [Feature] support auto chat template (sgl-project#4949)
* Feat: support cuda graph for LoRA (sgl-project#4115)
Co-authored-by: Beichen Ma <[email protected]>
* Add qwen3 30b fused moe config (sgl-project#5859)
* [Fix] Fix a bug for flashmla to run R1 model (sgl-project#5875)
Co-authored-by: pengcuo <[email protected]>
* Add A800 fused moe config for qwen3 30b (sgl-project#5880)
* [Misc] add service discovery for sgl router
* [fix]: PyO3 macOS linking and consolidate on tracing for logging
* chore: update Dockerfile (sgl-project#5894)
* [Docs] Update docs for Qwen3 and Qwen3MoE (sgl-project#5836)
* [Doc] Tables instead of bulletpoints for sampling doc (sgl-project#5841)
* chore: update CODEOWNERS (sgl-project#5895)
* [FEATURE] Enhance platform compatibility for ARM (sgl-project#5746)
* [CI] Add test_function_calling.py to run_suite.py (sgl-project#5896)
* Auto set draft model path for MTP (sgl-project#5793)
* [fix] relax mem_fraction_static for h200 (sgl-project#5893)
Co-authored-by: alcanerian <[email protected]>
* feat: support pythonic tool call and index in tool call streaming (sgl-project#5725)
* [Bugfix]: fix missing queue_time_start for requests from grammar_queue (sgl-project#5696)
* Add AMD MI300x Nightly Testing. (sgl-project#5861)
* chore: use torch 2.6 for sgl-kernel build (sgl-project#5898)
* Fix check_env script (sgl-project#5901)
* [PD] Fix Assertion failed: /DeepEP/csrc/kernels/internode.cu:483, condition: ibgda_get_state()->num_rc_per_pe >= num_channels sgl-project#134 (sgl-project#5830)
* Bump Flashinfer to 0.2.5 (sgl-project#5870)
Co-authored-by: Yuhao Chen <[email protected]>
* [Fix] Unload lora in HF_Runner if needed (sgl-project#5899)
* Add A800 fused moe config for qwen3 235b (sgl-project#5900)
* Add sm_120 for blackwell (sgl-project#5903)
* [Feature] add support kimi vl model (sgl-project#5383)
Co-authored-by: wenju.li <[email protected]>
* support vlm benchmark profile (sgl-project#5905)
* [fix] kimi-vl test in test_vision_openai_server.py (sgl-project#5910)
* [Misc] use parallel build for cmake in sgl-kernel (sgl-project#5919)
* [qwen3] support qwen3 ep moe (sgl-project#5917)
Co-authored-by: sleepcoo <[email protected]>
* Add TP2 MOE benchmarks for AMD. (sgl-project#5909)
* [Feat] Scale up fa3 kernel to sm8x arch (sgl-project#5912)
Co-authored-by: zhyncs <[email protected]>
* chore: bump sgl-kernel 0.1.1 (sgl-project#5932)
* chore: upgrade sgl-kernel 0.1.1 (sgl-project#5933)
* Remove unused method `calculate_num_image_tokens` from qwen2_vl.py (sgl-project#5783)
* [PP] Add pipeline parallelism (sgl-project#5724)
* Fix lora batch processing when input lora_path contains None (sgl-project#5930)
* add Thor & Spark (sgl-project#5915)
* fix: correct stream response when enable_thinking is set to false (sgl-project#5881)
* fix: update model runner (sgl-project#5934)
* chore: bump v0.4.6.post2 (sgl-project#5939)
* Support XiaomiMiMo/MiMo model inference (sgl-project#5921)
* [PD] Vectorise group_concurrent_contiguous in NumPy (sgl-project#5834)
Co-authored-by: luoyuan.luo <[email protected]>
* Remove extra contiguous (sgl-project#5953)
* Update ci test and doc for MTP api change (sgl-project#5952)
* docs: Fix Qwen model typo (sgl-project#5944)
Signed-off-by: JiangJiaWei1103 <[email protected]>
* Optimize a pad operation to accelerate 25us (sgl-project#5945)
* Properly return error response in vertex_generate HTTP endpoint (sgl-project#5956)
* feat: add concurrency evaluation logic in mmmu benchmark (sgl-project#5782)
* Add 1 gpu perf and 2 gpu accuracy tests for AMD MI300x CI. (sgl-project#5960)
* feat: Refactor DeepSeekV3 function call (sgl-project#5908)
* Remove token in token out in Native API (sgl-project#5967)
* Support InternVL3 (sgl-project#5350)
Co-authored-by: Mick <[email protected]>
Co-authored-by: Chayenne <[email protected]>
* Support MMMU benchmark for InternVL (sgl-project#5968)
* FA3 speed up: skip len operation and get batch size directly from forward batch (sgl-project#5969)
Signed-off-by: Lifu Huang <[email protected]>
* [PD] NIXL backend Prefill TP & Decode TP+DP (sgl-project#5681)
* Fix set kv cache multi-stream (sgl-project#5975)
* Overlap qk norm with two streams (sgl-project#5977)
* fix: only upgrade nccl for cu128 (sgl-project#5986)
* Fix Phi3 serving which was broke by earlier change (sgl-project#5991)
Co-authored-by: Lifu Huang <[email protected]>
* [perf] H100 DeepSeek-V3 fused moe tuned config (sgl-project#5998)
* [Fix] Suppress dynamo logging when using flashinfer backend with torch compile (sgl-project#5992)
* [Minor] Fix duplicate method definitions in conversation.py (sgl-project#6012)
Signed-off-by: Lifu Huang <[email protected]>
* Fix flaky issues of lora and add multi batch tests (sgl-project#5957)
* Tool Call: Add `chat_template_kwargs` documentation (sgl-project#5679)
* fix: fix broadcast_pyobj breaking VerlEngine (sgl-project#5997)
* [PD] Allow customizing reserved tokens to avoid KV cache waste (sgl-project#6002)
* Update dev container config to support live code sync and improve docker setup guide (sgl-project#6018)
Signed-off-by: Lifu Huang <[email protected]>
* [PD] Optimize disaggregation ib device help info (sgl-project#5781)
* [Test] Add flashmla attention backend test (sgl-project#5587)
* Fix "Avoid computing lse in Ragged Prefill when there's no prefix match" (sgl-project#5555)
* feat: Add a unified merge_state API (sgl-project#5428)
* feat: append more comprehensive fields in messages instead of merely role and content (sgl-project#5996)
* [Security][Bug] Prevent binding to all TCP interfaces (sgl-project#5752)
* Fix prefill OOM error in the case of large page size (sgl-project#5081)
* Fix problem of large page size with chunked prefill (sgl-project#6046)
* docs: add Google Cloud Vertex AI in Adoption and Sponsorship (sgl-project#6047)
* docs: add new blog (sgl-project#6048)
* Fix not "import os" (sgl-project#6057)
* Better PD initialization (sgl-project#5751)
* fix: deepep dockerfile, use pip install deepep. (sgl-project#5885)
* [Fix] Fix and rename flashmla CI test (sgl-project#6045)
* chore: upgrade cutlass 3.9.2 (sgl-project#6004)
Co-authored-by: yizhang2077 <[email protected]>
* Fix sgl-kernel build on aarch64 platforms (sgl-project#6062)
* Add DeepEP to CI PR Test (sgl-project#5655)
Co-authored-by: Jinyan Chen <[email protected]>
* fix custom_allreduce namespace (sgl-project#6039)
* feat: add release workflow for SGLang kernels on aarch64 (sgl-project#6010)
Co-authored-by: Qiaolin-Yu <[email protected]>
Co-authored-by: Yineng Zhang <[email protected]>
* [Feature] Support for Ascend NPU backend (sgl-project#3853)
Signed-off-by: Song Zhang <[email protected]>
Co-authored-by: 22dimensions <[email protected]>
* Fix the timeout for 8 gpu tests (sgl-project#6084)
* Hint users DeepEP normal mode is incompatible with CUDA Graph (sgl-project#5014)
* Super tiny fix doc (sgl-project#5233)
* [Doc]Fix description for dp_size argument (sgl-project#6063)
* feat(engine): add bootstrap parameters to generate methods (dynamo) (sgl-project#6075)
* [refactor] slightly tidy fp8 module (sgl-project#5993)
* Clean up fa3 test from 8 gpus (sgl-project#6105)
* Deferring 8 GPU test (sgl-project#6102)
* Update doc for MLA attention backends (sgl-project#6034)
* Clean logs for DeepSeek-V3 launching (sgl-project#6079)
* [CI]Add performance CI for VLM (sgl-project#6038)
Signed-off-by: Xinyuan Tong <[email protected]>
* adding Triton configs for DeepSeekV3 FusedMoE kernel on Blackwell (sgl-project#6111)
* optimize pad operations in fa3 to accelarate 100+us (sgl-project#6077)
* Overlap shared expert and routed expert computations (sgl-project#5121)
* Tiny refactor ModelConfig.from_server_args (sgl-project#5219)
* Tiny refactor weight loading logic (sgl-project#5232)
* [PD] Add control to slow down a server (sgl-project#5572)
* Change AMD test threshold (sgl-project#6091)
* DeepEP normal support deepgemm-contiguous (sgl-project#5626)
Co-authored-by: Yingyi Huang <[email protected]>
Co-authored-by: Cheng Wan <[email protected]>
Co-authored-by: Xuting Zhou <[email protected]>
Co-authored-by: ZhengHSI <[email protected]>
* [fix] fix pyproject.toml dependencies (sgl-project#6119)
* [Feature] Add FlashAttention3 as a backend for VisionAttention (sgl-project#5764)
Co-authored-by: othame <[email protected]>
Co-authored-by: Mick <[email protected]>
Co-authored-by: Yi Zhang <[email protected]>
* [perf] dsv3 bmm fallback to bf16 (sgl-project#5662)
* [AMD] switch to custom allreduce regardless of MSCCL setting on ROCm (sgl-project#6097)
* [sgl-kernel] fix: fix cu118 compile error (sgl-project#6123)
Co-authored-by: zhyncs <[email protected]>
* upgrade xgrammar to 0.1.19 (sgl-project#6129)
* Remove unecessary is_fa3_supported check (sgl-project#6112)
* chore: bump sgl-kernel 0.1.2 (sgl-project#6131)
* docs: update README (sgl-project#6132)
* [Fix] Incorrect Memory Allocation on CUDA:0 by Non-Zero CUDA Processes in TP/DP (sgl-project#5745)
* Cutlass MLA: Disable split kv due to NVIDIA/cutlass#2274 (sgl-project#6101)
* opt flashinfer mla cat (sgl-project#5822)
Co-authored-by: xuyongfei.xyf <[email protected]>
* Update amd nightly concurrency. (sgl-project#6141)
* feat: add thinking_budget (sgl-project#6089)
* [Bugfix] Fix Llama4 gibberish output with long context and CUDA graph (sgl-project#6162)
* fix bug that gpu0 occupies more memory when hicache is turned on (sgl-project#5778)
Co-authored-by: Zhiqiang Xie <[email protected]>
* chore: bump v0.4.6.post3 (sgl-project#6165)
* KV‑Cache (MHA, MLA): add missing start_layer / end_layer fields to MHATokenToKVPoolHost and MLATokenToKVPoolHost (sgl-project#6016)
Co-authored-by: 继优 <[email protected]>
Co-authored-by: chus-chus <[email protected]>
Co-authored-by: Zhiqiang Xie <[email protected]>
* [fix] fix determine_n_share_experts_fusion (sgl-project#6118)
* Fix and Clean up chat-template requirement for VLM (sgl-project#6114)
Signed-off-by: Xinyuan Tong <[email protected]>
* [Docs]Delete duplicate content (sgl-project#6146)
Co-authored-by: ximing.wxm <[email protected]>
* Revert "feat: add thinking_budget (sgl-project#6089)" (sgl-project#6181)
* Added async_encode method to Engine (sgl-project#4701)
* Fix data parallel perf regression (sgl-project#6183)
* Fix request abortion (sgl-project#6184)
* Add typo checker in pre-commit (sgl-project#6179)
Co-authored-by: Brayden Zhong <[email protected]>
* Remove duplicate IO Struct test (sgl-project#6180)
Signed-off-by: Emmanuel Ferdman <[email protected]>
* [PD] Add simple unit test for disaggregation feature (sgl-project#5654)
Signed-off-by: Shangming Cai <[email protected]>
* [CI] Disabled deepep tests temporarily because it takes too much time. (sgl-project#6186)
* feat: support loogle eval (sgl-project#6190)
* [fix] remove mixtral from is_fa3_default_architecture (sgl-project#6191)
* fix: handle None multimodal_inputs during merging and filtering batches in disaggregation decode mode (sgl-project#6169)
* chore: upgrade deepgemm (sgl-project#6073)
* chore: bump sgl-kernel v0.1.2.post1 (sgl-project#6195)
* chore: upgrade sgl-kernel v0.1.2.post1 (sgl-project#6196)
Co-authored-by: alcanderian <[email protected]>
* Handle empty input string for embedding models (sgl-project#5621)
Co-authored-by: Ravi Theja Desetty <[email protected]>
* doc: fix the erroneous documents and example codes about Alibaba-NLP/gme-Qwen2-VL-2B-Instruct (sgl-project#6199)
* [Docs] minor Qwen3 and reasoning parser docs fix (sgl-project#6032)
* Improve structured outputs: fix race condition, server crash, metrics and style (sgl-project#6188)
* [CI] Reorganize the 8 gpu tests (sgl-project#6192)
* Add dev-deepep docker image (sgl-project#6198)
* Replace time.time() to time.perf_counter() for benchmarking. (sgl-project#6178)
Signed-off-by: Lifu Huang <[email protected]>
* Update README.md (sgl-project#6202)
* Fix release-docs.yml to not use python 3.9 (sgl-project#6204)
* Fix start_profile does not support with_stack and record_shapes (sgl-project#6043)
* [doc] add a note for --n-share-experts-fusion args (sgl-project#6154)
* Performing Vocabulary Parallelism for LM Head across Attention TP Groups (sgl-project#5558)
Co-authored-by: liusy58 <[email protected]>
* Update AMD CI docker to v0.4.6.post3-rocm630. (sgl-project#6213)
* Log if cuda graph is used & extend cuda graph capture to cuda-graph-max-bs (sgl-project#6201)
Co-authored-by: SangBin Cho <[email protected]>
* [CI] Fix PD mooncake dependency error (sgl-project#6212)
Signed-off-by: Shangming Cai <[email protected]>
* [CI] Re-enable pd disaggregation test (sgl-project#6231)
Signed-off-by: Shangming Cai <[email protected]>
* fix some typos (sgl-project#6209)
Co-authored-by: Brayden Zhong <[email protected]>
* [Docs] Add docs for `SGLANG_` and `SGL_` environment variables (sgl-project#6206)
* [PP] Fix init_memory_pool desync & add PP for mixtral (sgl-project#6223)
* Revert "fix some typos" (sgl-project#6244)
* chore: add hf_xet dep (sgl-project#6243)
* Update AMD nightly deps. (sgl-project#6241)
* [PD] Add support for different TP sizes per DP rank (sgl-project#5922)
Signed-off-by: Shangming Cai <[email protected]>
* Support incremental streaming of logprob/token_ids between scheduler and detokenizer (sgl-project#6225)
Co-authored-by: SangBin Cho <[email protected]>
* fix typo (sgl-project#6248)
* Support tuning moe for llama 4 model (sgl-project#6042)
* Skip the flaky test_stateful_custom_logit_processor (sgl-project#6251)
* [Llama4] Add docs note about enable multimodal (sgl-project#6235)
* [VERL Use Case] Add torch_memory_saver into deps (sgl-project#6247)
* Fix two issues related to `--moe-dense-tp-size=1` (sgl-project#5657)
Co-authored-by: liusy58 <[email protected]>
Co-authored-by: 颉沆 <[email protected]>
* model(vlm): pixtral (sgl-project#5084)
* [misc] deep_gemm fallback to NVRTC when NVCC not found (sgl-project#6252)
* Enable MI325X AMD CI. (sgl-project#6259)
* chore: bump v0.4.6.post4 (sgl-project#6245)
* formatting fix for the rebased commit for 4.6.0_post4
Signed-off-by: Mohit Sinha <[email protected]>
* fix issues in model runner and python packages
fix for following issues:
> vLLM dependency for xgrammar==0.1.17
> 'Scheduler' object has no attribute 'device
> 'pp_proxy_tensors' unexpected arg in HPUGraphRunner
> TODO: Add pipeline parallelism support in HPUGraphRunner
Signed-off-by: Mohit Sinha <[email protected]>
* fix formatting in model runner
Signed-off-by: Mohit Sinha <[email protected]>
* base grammar fix for the is_terminated case
> 'OutlinesGrammar' object has no attribute 'is_terminated'
Signed-off-by: Mohit Sinha <[email protected]>
---------
Signed-off-by: Kebe <[email protected]>
Signed-off-by: congcongke <[email protected]>
Signed-off-by: JiangJiaWei1103 <[email protected]>
Signed-off-by: Lifu Huang <[email protected]>
Signed-off-by: Song Zhang <[email protected]>
Signed-off-by: Xinyuan Tong <[email protected]>
Signed-off-by: Emmanuel Ferdman <[email protected]>
Signed-off-by: Shangming Cai <[email protected]>
Signed-off-by: Mohit Sinha <[email protected]>
Co-authored-by: Wenxuan Tan <[email protected]>
Co-authored-by: JieXin Liang <[email protected]>
Co-authored-by: Yuhong Guo <[email protected]>
Co-authored-by: saltyfish66 <[email protected]>
Co-authored-by: vzed <[email protected]>
Co-authored-by: Mick <[email protected]>
Co-authored-by: Ke Bao <[email protected]>
Co-authored-by: saienduri <[email protected]>
Co-authored-by: DavidBao <[email protected]>
Co-authored-by: Frankey_8080 <[email protected]>
Co-authored-by: Stefan He <[email protected]>
Co-authored-by: yan97ao <[email protected]>
Co-authored-by: aoshen524 <[email protected]>
Co-authored-by: Michał Moskal <[email protected]>
Co-authored-by: lambert0312 <[email protected]>
Co-authored-by: Kebe <[email protected]>
Co-authored-by: zhanweidu <[email protected]>
Co-authored-by: Lianmin Zheng <[email protected]>
Co-authored-by: Baizhou Zhang <[email protected]>
Co-authored-by: Liangsheng Yin <[email protected]>
Co-authored-by: Huapeng Zhou <[email protected]>
Co-authored-by: NoahM <[email protected]>
Co-authored-by: Simo Lin <[email protected]>
Co-authored-by: Trevor Morris <[email protected]>
Co-authored-by: Yineng Zhang <[email protected]>
Co-authored-by: Xiaoyu Zhang <[email protected]>
Co-authored-by: fzyzcjy <[email protected]>
Co-authored-by: Michael Yao <[email protected]>
Co-authored-by: mlmz <[email protected]>
Co-authored-by: shuaills <[email protected]>
Co-authored-by: Chayenne <[email protected]>
Co-authored-by: XinyuanTong <[email protected]>
Co-authored-by: yhyang201 <[email protected]>
Co-authored-by: ybyang <[email protected]>
Co-authored-by: JiLi <[email protected]>
Co-authored-by: HAI <[email protected]>
Co-authored-by: PGFLMG <[email protected]>
Co-authored-by: sighingnow <[email protected]>
Co-authored-by: XTY <[email protected]>
Co-authored-by: Yi Zhang <[email protected]>
Co-authored-by: Chang Su <[email protected]>
Co-authored-by: woodx <[email protected]>
Co-authored-by: Qiaolin Yu <[email protected]>
Co-authored-by: Beichen Ma <[email protected]>
Co-authored-by: pengcuo <[email protected]>
Co-authored-by: pengcuo <[email protected]>
Co-authored-by: Adarsh Shirawalmath <[email protected]>
Co-authored-by: simveit <[email protected]>
Co-authored-by: Johnny <[email protected]>
Co-authored-by: alcanerian <[email protected]>
Co-authored-by: Yuhao Chen <[email protected]>
Co-authored-by: zhjunqin <[email protected]>
Co-authored-by: liwenju0 <[email protected]>
Co-authored-by: wenju.li <[email protected]>
Co-authored-by: laixin <[email protected]>
Co-authored-by: sleepcoo <[email protected]>
Co-authored-by: Ying Sheng <[email protected]>
Co-authored-by: ryang <[email protected]>
Co-authored-by: Yuan Luo <[email protected]>
Co-authored-by: luoyuan.luo <[email protected]>
Co-authored-by: 江家瑋 <[email protected]>
Co-authored-by: KCFindstr <[email protected]>
Co-authored-by: xm:D <[email protected]>
Co-authored-by: Lifu Huang <[email protected]>
Co-authored-by: Yongtong Wu <[email protected]>
Co-authored-by: Junrong Lin <[email protected]>
Co-authored-by: shangmingc <[email protected]>
Co-authored-by: DefTruth <[email protected]>
Co-authored-by: Zhiqiang Xie <[email protected]>
Co-authored-by: Hank Han <[email protected]>
Co-authored-by: Qiaolin Yu <[email protected]>
Co-authored-by: Jinyan Chen <[email protected]>
Co-authored-by: Jinyan Chen <[email protected]>
Co-authored-by: Johnny <[email protected]>
Co-authored-by: Song Zhang <[email protected]>
Co-authored-by: 22dimensions <[email protected]>
Co-authored-by: ishandhanani <[email protected]>
Co-authored-by: Cheng Wan <[email protected]>
Co-authored-by: Minglei Zhu <[email protected]>
Co-authored-by: lukec <[email protected]>
Co-authored-by: Yingyi Huang <[email protected]>
Co-authored-by: Xuting Zhou <[email protected]>
Co-authored-by: ZhengHSI <[email protected]>
Co-authored-by: Zhu Chen <[email protected]>
Co-authored-by: othame <[email protected]>
Co-authored-by: Hubert Lu <[email protected]>
Co-authored-by: Yixin Dong <[email protected]>
Co-authored-by: xu-yfei <[email protected]>
Co-authored-by: xuyongfei.xyf <[email protected]>
Co-authored-by: thyecust <[email protected]>
Co-authored-by: huangtingwei <[email protected]>
Co-authored-by: Simon (Jiyou) Li <[email protected]>
Co-authored-by: 继优 <[email protected]>
Co-authored-by: chus-chus <[email protected]>
Co-authored-by: Ximingwang-09 <[email protected]>
Co-authored-by: ximing.wxm <[email protected]>
Co-authored-by: Steven Shimizu <[email protected]>
Co-authored-by: applesaucethebun <[email protected]>
Co-authored-by: Brayden Zhong <[email protected]>
Co-authored-by: Emmanuel Ferdman <[email protected]>
Co-authored-by: Yusong Gao <[email protected]>
Co-authored-by: alcanderian <[email protected]>
Co-authored-by: Ravi Theja <[email protected]>
Co-authored-by: Ravi Theja Desetty <[email protected]>
Co-authored-by: liusy58 <[email protected]>
Co-authored-by: SangBin Cho <[email protected]>
Co-authored-by: 颉沆 <[email protected]>
Co-authored-by: Kiv Chen <[email protected]>1 parent 3e60d54 commit bc7d46c
File tree
376 files changed
+18558
-4465
lines changed- .devcontainer
- .github
- workflows
- 3rdparty/amd/tuning
- benchmark
- bench_in_batch_prefix
- benchmark_batch
- deepseek_v3
- generative_agents
- gsm8k
- hellaswag
- hicache
- json_decode_regex
- json_jump_forward
- json_schema
- kernels
- fused_moe_triton
- quantization
- line_retrieval
- llava_bench
- llm_judge
- long_json_decode
- lora
- mmlu
- mmmu
- mtbench
- multi_chain_reasoning
- multi_document_qa
- multi_turn_chat
- react
- reasoning_benchmark
- tip_suggestion
- tree_of_thought_deep
- tree_of_thought_v0
- docker
- docs
- backend
- developer
- references
- router
- start
- supported_models
- examples
- chat_template
- frontend_language/usage/rag_using_parea
- runtime
- engine
- multimodal
- python
- sglang
- eval
- lang
- srt
- configs
- constrained
- disaggregation
- base
- fake
- mooncake
- nixl
- distributed
- device_communicators
- entrypoints
- layers
- attention
- triton_ops
- moe
- ep_moe
- fused_moe_triton
- configs
- quantization
- compressed_tensors
- schemes
- lora
- triton_ops
- managers
- multimodal_processors
- mem_cache
- metrics
- model_executor
- model_loader
- models
- openai_api
- sampling
- speculative
- test
- scripts
- deprecated
- sgl-kernel
- benchmark
- csrc
- allreduce
- attention
- cpu
- gemm
- grammar
- speculative
- include
- python/sgl_kernel
- tests
- sgl-router
- .cargo
- py_src/sglang_router
- py_test
- src
- test/srt
- models
- lora
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
376 files changed
+18558
-4465
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
23 | | - | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
24 | 30 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
8 | | - | |
| 7 | + | |
| 8 | + | |
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
13 | | - | |
| 13 | + | |
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | | - | |
| 18 | + | |
19 | 19 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
29 | | - | |
30 | 25 | | |
31 | 26 | | |
32 | 27 | | |
| |||
35 | 30 | | |
36 | 31 | | |
37 | 32 | | |
| 33 | + | |
| 34 | + | |
38 | 35 | | |
39 | 36 | | |
40 | 37 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
28 | | - | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
29 | 32 | | |
30 | 33 | | |
31 | 34 | | |
| |||
38 | 41 | | |
39 | 42 | | |
40 | 43 | | |
41 | | - | |
| 44 | + | |
42 | 45 | | |
43 | 46 | | |
44 | 47 | | |
45 | 48 | | |
46 | | - | |
| 49 | + | |
47 | 50 | | |
48 | 51 | | |
49 | 52 | | |
| |||
66 | 69 | | |
67 | 70 | | |
68 | 71 | | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
69 | 113 | | |
70 | 114 | | |
71 | 115 | | |
72 | | - | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
73 | 120 | | |
74 | 121 | | |
75 | 122 | | |
| |||
82 | 129 | | |
83 | 130 | | |
84 | 131 | | |
85 | | - | |
| 132 | + | |
86 | 133 | | |
87 | 134 | | |
88 | 135 | | |
89 | 136 | | |
90 | | - | |
| 137 | + | |
91 | 138 | | |
92 | 139 | | |
93 | 140 | | |
| |||
104 | 151 | | |
105 | 152 | | |
106 | 153 | | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
107 | 267 | | |
108 | 268 | | |
109 | 269 | | |
110 | | - | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
111 | 274 | | |
112 | 275 | | |
113 | 276 | | |
| |||
120 | 283 | | |
121 | 284 | | |
122 | 285 | | |
123 | | - | |
| 286 | + | |
124 | 287 | | |
125 | 288 | | |
126 | 289 | | |
127 | 290 | | |
128 | | - | |
| 291 | + | |
129 | 292 | | |
130 | 293 | | |
131 | 294 | | |
| |||
141 | 304 | | |
142 | 305 | | |
143 | 306 | | |
144 | | - | |
| 307 | + | |
145 | 308 | | |
146 | 309 | | |
147 | 310 | | |
148 | 311 | | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
149 | 332 | | |
150 | 333 | | |
151 | 334 | | |
152 | | - | |
| 335 | + | |
| 336 | + | |
153 | 337 | | |
154 | 338 | | |
155 | 339 | | |
| |||
0 commit comments