Commit 59918f9
committed
Sphere AI Lab modifications on top of SGLang v0.5.9
Summary of changes vs upstream:
- Add Orbit backend support for OFT and quantized serving paths used by the RL pipeline.
- Extend scheduler, distributed, CUDA wrapper, and kernel integration for the supported CUDA 13 stack.
- Add public release attribution and remove internal-only development metadata.
- Omit test directories from the public release snapshot.
Fork base: sgl-project/sglang @ v0.5.9 (bbe9c7e).1 parent bbe9c7e commit 59918f9
927 files changed
Lines changed: 25827 additions & 177254 deletions
File tree
- .claude/skills
- add-jit-kernel
- add-sgl-kernel
- .github/workflows
- docs/advanced_features
- python
- sglang
- jit_kernel/tests
- multimodal_gen
- apps/ComfyUI_SGLDiffusion/test
- csrc/attn/vmoba_attn/tests
- test
- cli
- scripts
- server
- ascend
- test_files
- srt
- batch_invariant_ops
- configs
- debug_utils
- distributed/device_communicators
- entrypoints
- openai
- eplb
- grpc
- layers
- attention
- deep_gemm_wrapper
- moe
- fused_moe_triton
- moe_runner
- token_dispatcher
- quantization
- compressed_tensors
- schemes
- quark/schemes
- lora
- managers
- mem_cache
- model_executor
- models
- oft
- backend
- torch_ops
- triton_ops
- utils
- weight_sync
- test
- ascend
- attention
- ci
- external_models
- kits
- longbench_v2
- server_fixtures
- speculative
- sgl-kernel/tests
- spatial
- speculative
- sgl-model-gateway
- bindings/python/tests
- tests
- api
- common
- fixtures/images
- reliability
- routing
- security
- spec
- test
- lm_eval_configs
- manual
- ascend
- cpu
- debug_utils
- entrypoints/http_server
- ep
- hicache
- kv_transfer
- lang_frontend
- layers
- attention/nsa
- moe
- lora
- models
- nightly
- openai_server/features
- piecewise_cudagraph
- quant
- vlm
- registered
- 4-gpu-models
- 8-gpu-models
- amd
- accuracy
- mi30x
- mi35x
- disaggregation
- perf
- mi30x
- mi35x
- ascend
- basic_function
- interface
- parameter
- embedding_models
- llm_models
- rerank_models
- reward_models
- vlm_models
- attention
- backends
- bench_fn
- constrained_decoding
- core
- cuda_graph
- debug_utils
- disaggregation
- distributed
- dllm
- ep
- eval
- function_call
- hicache
- kernels
- layers
- mamba
- lora
- metrics
- mla
- model_loading
- models
- moe
- openai_server
- basic
- features
- function_call
- validation
- ops
- parser
- perf
- profiling
- quant
- radix_cache
- rl
- rotary
- sampling
- scheduler
- spec
- eagle
- utils
- stress
- tokenizer
- utils
- vlm
- srt
- ascend
- configs
- cpu
- models
- xpu
- unit/utils
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
This file was deleted.
0 commit comments