v0.3.8
What's Changed
- adxl: fix aclrtMemcpyBatch max 4096 limit bug by @ascend-direct-dev in #963
- ci: add non-CUDA release workflow and update documentation by @xiaguan in #969
- fix(transfer_engine): Add notify callback registration in RPC metadata handling by @iBenzene in #966
- [Misc] (Mooncake Backend) Early break if a rank failure can be determined through ping message by @UNIDY2002 in #980
- Revise installation instructions for non-cuda mooncake by @ShangmingCai in #985
- [Store] feat: store key-value data in buckets by @zhuxinjie-nz in #968
- Support AMDGPU (refactor CUDA-alike) by @yeahdongcn in #973
- add log by @ascend-direct-dev in #996
- Feature: support custom key prefix for issue 957 by @uniqueni in #958
- refactor(store): store remove transfer engine internal api usage by @xiaguan in #994
- [TransferEngine] Mitigating performance overhead from large cluster and large bulks by @alogfans in #999
- Bump version to 0.3.7.post1 in pyproject.toml by @ShangmingCai in #984
- [store] feat: add secondary storage usage monitor by @yejj710 in #976
- fix(ci): remove nvlink allocator --ci-build flag by @xiaguan in #1003
- [DOC] Update fig by @stmatengss in #987
- Modify build command for nvlink_allocator by @ShangmingCai in #1001
- fix(ci): add id-token permission and unify PyPI token for release by @xiaguan in #1004
- [Misc] Lazy import
epinmooncake_ep_buffer.pyby @UNIDY2002 in #1014 - Bump version to 0.3.7.post2 in pyproject.toml by @ShangmingCai in #1015
- [Store] Fix CI bugs & Improve log output & Refactor TE Initialization by @ykwd in #1006
- [CI] Fix a CI BUG in PyClientTest:TestSetupExistTransferEngine by @ykwd in #1016
- [Misc] Remove EP's duplicated impl of
getCudaTopologyJsonby @UNIDY2002 in #1009 - [Store]: Cleanup processing objects if transferring timedout (#975) by @nickyc975 in #993
- [Store] support segment level metrics(fix code format of #1029) by @cocktail828 in #1030
- [Store] One Replica Has One Slice by @ykwd in #1032
- [DOC] Update Slack link in README.md by @stmatengss in #1042
- [Doc] Update SGLang Hicache Docs by @ykwd in #1023
- [Store] add choosing endpoint store option by @stmatengss in #1024
- [feat]More KVCache metrics in both master/client side by @Liziqi-77 in #1020
- change adxl log by @ascend-direct-dev in #1039
- Te seperated compilation by @zhaoyongke in #1041
- Fix TCP Transport Handshake Daemon Initialization by @staryxchen in #846
- add batch [put/get] tensor by @XucSh in #1044
- handling cudaMemcpy errors in tcp_transport.cpp by @flying-x in #1057
- [Store] fix: honor MC_MS_FILTERS by applying whitelist before TransferEngine init by @wwq2333 in #1051
- [Doc] Add license badge to README by @stmatengss in #1063
- [DOC] add web api doc by @stmatengss in #1059
- [Chore] Add Contributor Covenant Code of Conduct by @stmatengss in #1056
- Add pull request template for standardized PR submissions by @Copilot in #1065
- [Store|TransferEngine]: use condition-variable based completion instead of busy-polling by @wwq2333 in #1053
- docs: update README by @zhyncs in #1079
- [store] Add disk eviction feature by @Vincent-Bo-ali in #1028
- [Store] MasterMetricManager Returns Zero-Value Variables by @ykwd in #1068
- [RDMA] Fix RDMA device selection to prioritize GIDs with network devices by @uniqueni in #1077
- [CI] add sglang integration test by @stmatengss in #1089
- [EP] Fallback impl of Mooncake EP when IBGDA is unavailable by @UNIDY2002 in #1002
- TCP transport support ipv6 by @LCAIZJ in #1067
- [Store] add version checking between client and server by @stmatengss in #1061
- [TE/Topology] Support device filtering when dumping topology by @popsiclexu in #1087
- [BugFix] Adapt mooncake_connector_v1 to latest vllm by @ZeldaHuang in #1080
- [CI] Add label event by @XucSh in #1108
- Adapt to adxl connection auto release feature by @ascend-direct-dev in #1072
- feat[Store]: Add standalone deployment implementation for Client by @YiXR in #1084
- feat[accl-barex]: add barex_transport by build with USE_BAREX by @ZechaoZhang-beta in #1045
- Update CI by @XucSh in #1111
- [TE/Topology] Enhance PCI distance calculation by considering NUMA node affinity by @popsiclexu in #1086
- [DEV] add pre-commit by @stmatengss in #1124
- [Store] Cancel all negative ret val by @Azure-Tang in #1129
- [Store] fix compilation warning in storage backend by @stmatengss in #1134
- [EP] Support multiple torch versions by @UNIDY2002 in #1098
- feat[Store]: Add multi dummy clients support for real client by @YiXR in #1122
- [Store] Add support for static labels (host IP/cluster name) in client metrics by @cocktail828 in #1081
- [Misc] Add Codeowners by @ykwd in #1135
- fix MC_MAX_EP_PER_CTX doc by @whybeyoung in #1142
- [Bug] fixed bug of master not using glog actually by @SpecterCipher in #1075
- change cmake by @ascend-direct-dev in #1114
- feat[Store]: Refine shm mmap logic and add reconnection for Dummy Client after the Real restarted by @YiXR in #1146
- [mooncake-store]: prevent orphaned bucket data files from leaking dis… by @maheshrbapatu in #1140
- [store] Fix IPv6 link-local address parsing and add IPv6 tests by @Azure-Tang in #1137
- [CI] Install CUDA toolkit on job
test-wheel-ubuntu, so that the wheel can be built withUSE_CUDA=ONby @UNIDY2002 in #1156 - [store] add pybind for get_replica_desc by @yejj710 in #1121
- [Store]: Refactor AllocationStrategy implementation for better performance and flexibility by @nickyc975 in #1149
- [Store] Optimize master & client binary size by @YiXR in #1166
- Add a CI test for Mooncake EP Backend (CPU only) by @UNIDY2002 in #1099
- Improve AMD HIP support with hipify-perl by @amd-arozanov in #1154
- [DOC] Add MAINTAINERS.md by @alogfans in #1171
- [MUSA] Enable USE_MNNVL by @yeahdongcn in #1176
- [Store] feat: Add BatchQueryIp API for querying multiple client IPs by @Vincent-Bo-ali in #1162
- [Store] pub_tensor for multiple replica by @zxpdemonio in #1148
- [Store] feat: Implement a FileStorage component to manage the lifecycle of key-value data by @zhuxinjie-nz in #1031
- [Doc] add docs of Mooncake EP integration with SGLang by @UNIDY2002 in #1188
- Pr1 coro rpc core by @JasonZhang517 in #1104
- [TE] Support rdma traffic class by environmental variable by @yafengio in #1187
- [Store] add tp awareness for get_tensor by @XucSh in #1127
- feat[Store]: Introduce shm helper for dummy by @YiXR in #1177
- [yalantinglibs]set ylt log level with env by @qicosmos in #1190
- [TE/Examples] Memory initialization and HIP cleanup fixes by @amd-arozanov in #1179
- [TE] AscendDirectTransport: HIXL support IPV6 by @MingYang119 in #1194
- [Store] feat: Add BatchReplicaClear API for manual cache cleanup by @Vincent-Bo-ali in #1191
- [Store]: Add master_bench for benchmarking QPS of MasterService by @nickyc975 in #1201
- [doc]merge doc to docs,and change all internal links to blog. by @Keithwwa in #1153
- 【docu】Remove duplicate mooncake store by @Keithwwa in #1211
- feat: add PCIe Relaxed Ordering (RO) support and RDMA traffic class (… by @1998zxn in #1076
- [CI] Add sglang e2e tests by @luketong777 in #1181
- feat[Store]: add multi shm support for dummy and real client by @YiXR in #1206
- [bugfix] fix bfloat16 for get_tensor by @XucSh in #1216
- [TE]: Add HIP transport for AMD GPUs support by @amd-arozanov in #1208
- [TE/HIP] Fix HIP Shareable POSIX File Descriptor Handles by @amd-arozanov in #1218
- [TE] Memorize batched transfer status by @alogfans in #1205
- [TransferEngine]: HIXL support ipv6 when searching for available port by @MingYang119 in #1220
- [EP] Fix the tensorSize of the barrier op by @UNIDY2002 in #1222
- refactor tensor api and add tests by @XucSh in #1217
- [Store] feat: Implement a unified storage interface to simplify integration and extension by @zhangzuo21 in #1185
- Fix: add missing include path for cuda_alike.h in mooncake-transfer-engine/nvlink-allocator/build.sh by @yeahdongcn in #1224
- feat(metrics): add task completion latency tracking and reporting by @staryxchen in #1130
- [CI] fix: don't skip any CI test by @stmatengss in #1229
- Fix compilation warnings for missing field initializers by @Copilot in #1232
- Add vllm v1 mooncake benchmark and launch guide by @Azure-Tang in #1223
- [store] zero copy for get_tensor() and batch_get_tensor() by @zxpdemonio in #1192
- make para more clear by @XucSh in #1239
- Fix missing error handling for cuPointerGetAttribute call by @fzyzcjy in #1241
- Build cuMem based allocator when disabling peermem by @fzyzcjy in #1244
- Support cuMem allocator when Fabric is unsupported at runtime by @fzyzcjy in #1245
- [Fix] Fix broken link in doc by @Azure-Tang in #1240
- [Feature] Support H20 intraNode nvlink by introducing fallback mechanism to leverage cudaIPC by @TTThanos in #1234
- feat(rdma): add parallel memory region registration support by @staryxchen in #1238
- [Doc] Fix typo in the tutorial by @tianrenz2 in #1247
- Fix error when peermem is disabled caused by multi threading by @fzyzcjy in #1246
- [EP] Implement elastic scaling up by @UNIDY2002 in #1173
- [Doc] update toc item of ep-backend by @UNIDY2002 in #1252
- [Misc] Update CODEOWNERS for mooncake-ep by @UNIDY2002 in #1253
- [EP] Implement send/recv by @UNIDY2002 in #1236
- [CI] Force TCP for Mooncake EP Backend tests by @UNIDY2002 in #1255
- [Store] Decouple master from transfer_engine dependencies by @00fish0 in #1233
- [TE] Add TENT codebase to main (Phase 1: structural import) by @alogfans in #1213
- [CI] Disable EP's test_mooncake_backend_p2p_cpu in CI workflow by @UNIDY2002 in #1256
- [Doc] add more mooncake store APIs doc by @stmatengss in #1237
- add MC_FORCE_HCA environment variable to force use rdma by @baymaxhuang in #1259
- [CI] Disable certain tests in CI configuration by @UNIDY2002 in #1263
- [Doc] Add update for RBG + SGLang HiCache integration by @stmatengss in #1264
- [doc] Restruct doc about vllm support. by @Azure-Tang in #1275
- [CI] Add retry mechanism to handle GitHub API rate limit in test-sglang-integration job by @luketong777 in #1273
- [store] Prefer local segment when get_buffer/get_into by @zxpdemonio in #1258
- [store] add async api by @XucSh in #1265
- [TE] feat:ascend direct transport support async transfer by @ascend-direct-dev in #1274
- Bump version to 0.3.8 in pyproject.toml by @ShangmingCai in #1285
- [CI] Try to reduce disk usage during release build by @UNIDY2002 in #1287
- Remove Python 3.9 from release workflow by @ShangmingCai in #1290
New Contributors
- @iBenzene made their first contribution in #966
- @zhuxinjie-nz made their first contribution in #968
- @yeahdongcn made their first contribution in #973
- @yejj710 made their first contribution in #976
- @cocktail828 made their first contribution in #1030
- @flying-x made their first contribution in #1057
- @zhyncs made their first contribution in #1079
- @Vincent-Bo-ali made their first contribution in #1028
- @ZeldaHuang made their first contribution in #1080
- @ZechaoZhang-beta made their first contribution in #1045
- @Azure-Tang made their first contribution in #1129
- @whybeyoung made their first contribution in #1142
- @SpecterCipher made their first contribution in #1075
- @maheshrbapatu made their first contribution in #1140
- @amd-arozanov made their first contribution in #1154
- @zxpdemonio made their first contribution in #1148
- @yafengio made their first contribution in #1187
- @MingYang119 made their first contribution in #1194
- @Keithwwa made their first contribution in #1153
- @1998zxn made their first contribution in #1076
- @luketong777 made their first contribution in #1181
- @zhangzuo21 made their first contribution in #1185
- @TTThanos made their first contribution in #1234
- @tianrenz2 made their first contribution in #1247
- @00fish0 made their first contribution in #1233
- @baymaxhuang made their first contribution in #1259
Full Changelog: v0.3.7...v0.3.8