You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* sync main
* typo correct
* 1. typo 2. add migration event
* 1. move slime to 'https://github.com/JimyMa/DLSlime.git' and init readme.
* Update disagg README
* mute slime when disable distserve
* remove build_migration.sh
* revert debug code
* 1. identify interface. 2. add multi backend registry
* add dlslime max transfer batch
* add an infinistore interface
* add load/store
* conditional register of Multi Migration Backend
* merge router to proxy
* remove redandunt print
* 1. remove redandunt print 2. revert safe_run
* dsv3 kvtransfer support (bypass v cache)
* dsv3 debug, 1. change log info to log debug of log resp. 2. add num_cpus to ray.init for run in dlc
* DSV3 Debug, known issue:
1. [PD Connection more efficiently][High Priority] In DSV3 DP + EP condition, we need to concurrently construct prefill_dp_size (for exampe 32) * decode_dp_size(for example 144) links. We add a function `pd_consolidation_multi_thread` to do this. However, we need to check if the construction operation is thread safe.
2. [Combine with proxy] Maybe we should save conn_config to avoid repeatly reconnection of PD Link.
3. [PD Control Plane][High Priority] For DP + EP, we need to reconstruct DisaggEngineConfig to record more information (e.g. dp_idx, tp_idx ...)
4. [Combine with router][Important] How to perform PD Load Balance in disaggregated LLM Serving.
5. [PD Data Plane] adapt to Open Source KVCache manager like Mooncake, infiniStore or NiXL and more transport media.
* revert match to if,else
* [bugfix] rename typo
* [refactor] refactor pd_conn
* 1. format code. 2. add engine_role for passing ut test
* 1. format code 2. parse dp, ep, and dp rank to DisaggEngineConfig
* 1. add pd conn timeout, 2. add default EngineRole to Hybrid, 3. fix disagg strategy proxy typo
* 1. refactor PDConnection Pool
* refactor debug
* fix migration loop bug
* add proxy arguments about distserve
* bugfix
* debug interface
* remove unnesessary EngineRole Check.
* add v1/chat/completions support
* remove redundent print
* async free cache
* async free cache
* 1. add some comments.
* 1. bugfix
* [proxy] add connection_warmup api
* 1. bugfix (warmup_connection_typo and wrong args) 2. preserve cache bugfix
* [disagg] update readme, 1. fault tolerance and 2. replace router to proxy.
* bugfix
* fix decode back pressure bug
* 1. add migration_request to chat/completions for correctly cache free
* 2. free cache bugfix
* 1. fix lock running bug
* 1. fix dist.broadcast deadlock
* [lint] 1. fix lint
* rename Ethernet to RoCE
* change emun.Enum.__members__[elem] to enum.Enum[elem] directly
* update readme
* update migration-backend
* 1. update readme 2. move module to string for conditional import
* 1. update readme
* 1. remove migic number and handle long assignments in dlslime. 2. add uniexecutor support
* fix error migration in dummy situation
* 1. bugfix when token is not a decodable utf-8 (in test)
* 1. overlapping migration and forward.
* bump dlslime to v0.0.1.post5
* remove print
* remove free in decode engine because already freed in proxy
* 1. bump dlslime to 0.0.1.post7
* 1. [proxy] revert self.nodes to nodes 2. [api_server] remove redundant api
* 1. [cli] remove available_nic args
* format comments
* [pytorch paging] remove redundant logger
* [model_agent] bugfix caused by merge
* [model agent] bypass model agent migrate
* revert migrate to sync mode
* bypass model agent migrate in uni_executor
* [proxy] set default serving strategy to DistServe
* 1. [disagg] update readme
* info -> debug
* remove unused code
* lazily initialize migration event
* add nvlink support
* mute TCP support by now
* update readme for execption
* set migration token_ids output to numpy array
* update readme
* In PD Disaggregation Mode, fallback next token ids to CPU
* 1. [disagg] update readme
* move disagg to pytorch backend
LMDeploy-DistServe support both NVLink and RDMA for kvcache transferring from Prefill Engine to Decode Engine. RDMA is default model. Set `--migration-protocol NVLink` for NVLink transport.
By now, lmdeploy-distserve use GPUDirect RDMA to perform KVTransfer. Make sure GPUDirect RDMA Driver is loaded to kernel.
89
+
90
+
```bash
91
+
lsmod | grep nv_peer_mem
92
+
# GPUDirect RDMA info will be printed If GPUDirect RDMA is correctly loaded.
93
+
```
94
+
95
+
### Connection Pool
96
+
97
+
Currently, if the Proxy disconnects, the connection pool must be warmed up again. A future enhancement could involve:
98
+
99
+
A dedicated connection pool management server (e.g., using Raft-based tools like ETCD, as mentioned in Mooncake) to improve connection discovery and avoid repeated warmups.
100
+
101
+
### Proxy
102
+
103
+
Do not add an engine nodes to **different proxy** because it is not supported and is not considered as a right usage by now.
0 commit comments