What's Changed
- [Store] fix: switch offset allocator node storage to vector by @yuechen-sys in #1286
- [Store] feat:mooncake store enable ascend fabric mem mode by @ascend-direct-dev in #1170
- [Doc] Fix transfer engine doc image paths by @00fish0 in #1295
- Update yalantinglibs dependency to use direct archive download by @Copilot in #1282
- [store] support batch pub and tp aware for pub_tensor by @zxpdemonio in #1288
- doc: add dummy client support for SGLang hicache integration by @YiXR in #1299
- Add YiXR as code owner for mooncake-store by @stmatengss in #1301
- doc: fix wrong env by @YiXR in #1302
- [store] add error checking for get_tensor_into by @zxpdemonio in #1272
- [Store] add HugePage support by @YiXR in #1300
- [TENT] Add backward compatibility with TE by @alogfans in #1277
- [Store]: Add OffsetAllocator disk backend with lock-striped metadata + refcounted extents. by @maheshrbapatu in #1284
- [EP] Use NVLink in EP if possible by @UNIDY2002 in #1308
- [EP] Replace _mm_pause() with PAUSE() macro by @UNIDY2002 in #1313
- [TE] Add Task Completion Latency Tracking and Detailed Metrics Reporting by @staryxchen in #1310
- feat: Add code coverage support in CI by @LiYiMing-lg in #1316
- docs: add Docker deployment instructions to Chinese build guide by @LiYiMing-lg in #1318
- [TE] Add early mem backend detection method in NVLINK_allocator by @TTThanos in #1296
- [TE]feat: ascend direct transport add async tranfer task limit by @ascend-direct-dev in #1325
- [TENT] fix a bug in memory registration and add TENT support to the example by @staryxchen in #1330
- [Store][Feature]: add task manager component by @zhongzhouTan-coder in #1326
- [Store] - Implement partial success handling in BatchOffload by @maheshrbapatu in #1319
- [TE] support arm arch PAUSE() by @stmatengss in #1340
- [CI] add cuda13 wheel release workflow by @stmatengss in #1331
- [TENT] Add Redis Authentication and Database Selection Support by @staryxchen in #1339
- [TE] Revert nvlink_transport.cpp to previous version by @TTThanos in #1342
- Bump version to 0.3.8.post1 in pyproject.toml by @ShangmingCai in #1348
- [CI] Fix torch index by @UNIDY2002 in #1349
- [TE/HIP] Switch to IPC mode by default and enable P2P access by @amd-arozanov in #1344
- [Doc] Add goverance doc by @ykwd in #1352
- [CI] enable sglang-integration tests for forked repos with pull_reque… by @Ann-1024 in #1351
- [Build] auto add commit id to pyproject toml by @stmatengss in #1345
- [Store][Feature]: add copy/move execution api support in master side by @zhongzhouTan-coder in #1327
- [Store] Add Prometheus and Grafana example by @stmatengss in #1335
- [Store]: Use SharedMutex in MetadataShard for better performance by @nickyc975 in #1343
- [TENT] Add Metrics System with HTTP Server and Prometheus Integration by @staryxchen in #1355
- [Misc] Add amd-arozanov as CODEOWNERS for TE hip_transport by @ykwd in #1354
- [TE/HIP] Add stream and event pools for async transfer operations by @amd-arozanov in #1353
- [CI]optimize clang-format workflow to check only changed files by @staryxchen in #1359
- [CI] fix integration test on push and validate download error by @Ann-1024 in #1361
- [Store] Add retry logic for auto port binding in client setup by @chenkaiyue in #1328
- Add Tensor-Centric Ecosystem component to README by @Copilot in #1358
- [Doc] add docs of Mooncake EPD integration with SGLang by @liusy58 in #1262
- [Store][FIX] fix issue that the preferred segments not working when put by @zhongzhouTan-coder in #1360
- [TE] Add support for local communication resource configuration via environment variable by @Cheng-China in #1366
- PR3 coro_rpc_communicator python bindings by @JasonZhang517 in #1106
- [CI] Improve Code Formatting Workflow by @staryxchen in #1368
- [CI] Fix CI Failure by @ykwd in #1367
- [Doc] Update News by @ykwd in #1383
- [TE]feat(topology): add IB device availability filtering by @luketong777 in #1375
- [CI] add cuda13 CI test by @stmatengss in #1380
- Mooncake PG by @ympcMark in #1387
- [CI]Add testcase test_disaggregation_different_tp and fix potential network issues by @luketong777 in #1386
- [TE]fix: correct RDMA device path mapping by @luketong777 in #1393
- [Doc] Add batch API docs and examples for transfer engine by @chestnut-Q in #1395
- [TENT] Fix crash problem when device is down by @alogfans in #1333
- [CI]fix(ci): only count timeout while job is running (skip pending) and filter artifacts by excluding cu130 builds by @luketong777 in #1394
- Mooncake PG buffer by @ympcMark in #1401
- add troubleshooting of exceeding ulimit by @zhangzuo21 in #1405
- [PG] Always send pre-flight requests during backend init by @UNIDY2002 in #1409
- [Build] Retrieve the actual glibc version during the build process by @zxpdemonio in #1402
- chore: update clang-format to v20.1.8 and enforce version 20 by @staryxchen in #1379
- feat: add get_engine_ptr method for easily send ptr to another c++ extensions from python by @weixiao-huang in #1411
- [CI]Set DEFAULT_MODEL_NAME_FOR_TEST to meta-llama/Llama-3.2-3B-Instruct for test_disaggregation_different_tp by @luketong777 in #1412
- [Wheel] Remove the default buffer pre-allocation in initialize() by @chestnut-Q in #1415
- [TE] Fix SIGSEGV in Session::writeBody/readBody logging. by @caozhanhao in #1418
- [performance] decrease regmr overhead in ep_buffer/gda path by @Bruce-x-1997 in #1414
- [DOC] Add update for Mooncake Project approval by @stmatengss in #1419
- [TE] Nvlink intraNode Transport isolation by @TTThanos in #1341
- [Store] - Add comprehensive storage backend benchmark suite by @maheshrbapatu in #1388
- [CI] Support torch==2.10.0 by @UNIDY2002 in #1420
- [TE] Add configurable handshake max length via MC_HANDSHAKE_MAX_LENGTH by @herrluk in #1392
- [Store] Support batch query keys api for master service http server by @s5u13b in #1417
- feat(Store): Support get local ssd object by @zhuxinjie-nz in #1203
- Intra-Node NVLink related Docs modification by @TTThanos in #1424
- [Store] unregister client local buffer when tear down by @hjchen2 in #1413
- [Store] feat: TENT/Store integration improvements by @00fish0 in #1398
- [TE] Fix race condition in receivePeerMetadata and getSegmentDesc by @staryxchen in #1373
- [TE] fix wakeup race condition in RDMA WorkerPool by @chestnut-Q in #1434
- fix(ep_buffer): rm the fixed id 3 for rdma device by @KMSorSMS in #1432
- [TE] Enable ubshmem transport via ascend vmm apis by @VNightMare in #1399
- [Store] validate metadata when put_tensor by @zxpdemonio in #1396
- [Store] feat: wait Master ready when starting Store server by @acelyc111 in #1438
- [Store][Feature]task add rpc_only protocol by @LiuYi-Up in #1334
- Document all supported communication protocols by @Copilot in #1435
- [PG] Support full reduction ops (Product/Min/Max) and fix reduce kernel indexing bug by @hhr2449 in #1440
- [Store] fix malloc physical for ascend fabric mem by @ascend-direct-dev in #1427
- [PG] Quick fix for reduceCpu by @UNIDY2002 in #1444
- [CI]fix: update router service check, test env, and docker image pulling by @luketong777 in #1446
- [TENT] feat(RDMA): add IB device availability checks and improve GID selection by @00fish0 in #1397
- Pass a dict to
setupapi to reduce api changes. by @maobaolong in #1445 - [Doc] Enhance update on Mooncake Project approval by @stmatengss in #1449
- [TE] Support transfer in cuda stream via cudaLaunchHostFunc by @uncharted-G in #1448
- [Doc] add FlexKV project update to README by @staryxchen in #1452
- [Store][Feature] Add CXL storage for mooncake_store by @qiuweit7 in #1365
- add force flag for remove by @XucSh in #1425
- docs: fix typo in README.md by @staryxchen in #1453
- [Test] Update YALANTINGLIBS_VERSION to 0.5.7 by @stmatengss in #1457
- [Docs] Sync recent news to docs/index.md by @ykwd in #1461
- [ep] Avoid mooncake ep test crash when ibgda_init fails by @zxpdemonio in #1410
- [Bugfix] sync vllm mooncake connector from main repo by @dtcccc in #1466
- [PG] Impelemented support for additional collective primitives:
gather,scatter, andreducein the Mooncake backend and Added unit tests in test_mooncake_backend.py and test_mooncake_backend_cpu to cover new ops by @hhr2449 in #1469 - [EP] Clear sticky CUDA error on EP buffer re-init by @UNIDY2002 in #1470
- [Docs] Update Readme by @ykwd in #1472
- [TE] add GlobalResourceConfig config for ascend direct transport by @ascend-direct-dev in #1464
- [TE] Backport GDS xport to mainstream by @alogfans in #1430
- [Doc] improve Transfer Engine C++ API Reference by @00fish0 in #1467
- [TENT] Add RDMA-based notification by @alogfans in #1460
- Add ascend-direct-dev as codeowner by @alogfans in #1473
- CI CI: testcases can run in a docker by @luketong777 in #1454
- [TENT] fix: support loading config from file and fix nested JSON path lookup by @00fish0 in #1476
- feat: use setuptools_scm for more elegant version and support MOONCAKE_LOCAL_VERSION env by @weixiao-huang in #1479
- [TENT] Fix: sync metadata after unregistering local memory by @00fish0 in #1485
- [TE] fix use short connect bug: disconnect happen before mark success by @ascend-direct-dev in #1481
- [CI] Add new labels for PyTorch Backend and Mooncake EP by @stmatengss in #1487
- [EP] enlarge
kNumMaxTopKand support more hidden size by @UNIDY2002 in #1492 - [Bug Fix]Fix segmentation fault when using python wrapped transfer engine and store in the same process. by @zhangzuo21 in #1471
- [CI]Add vllm 1p1d test case by @luketong777 in #1491
- [Docs]: Add TENT C++ API reference by @00fish0 in #1488
- [TE] Add auto-connect feature controlled by config and environment va… by @Cheng-China in #1482
- Update PyTorch installation URL for CUDA 13 by @ShangmingCai in #1495
- Add TODO for more cu13 version support in build script by @ShangmingCai in #1496
- Revert (#1479) since it breaks the release workflow by @ShangmingCai in #1497
- Bump version to 0.3.9 in pyproject.toml by @ShangmingCai in #1498
- Revert "[Build] auto add commit id to pyproject toml" by @ShangmingCai in #1501
- [PG] Fix incorrect calculation of task_id during transfer status fetching by @UNIDY2002 in #1474
- [CI] Modifying package name for CUDA-13 build by @UNIDY2002 in #1502
New Contributors
- @yuechen-sys made their first contribution in #1286
- @LiYiMing-lg made their first contribution in #1316
- @chenkaiyue made their first contribution in #1328
- @Cheng-China made their first contribution in #1366
- @weixiao-huang made their first contribution in #1411
- @caozhanhao made their first contribution in #1418
- @Bruce-x-1997 made their first contribution in #1414
- @herrluk made their first contribution in #1392
- @s5u13b made their first contribution in #1417
- @KMSorSMS made their first contribution in #1432
- @VNightMare made their first contribution in #1399
- @acelyc111 made their first contribution in #1438
- @LiuYi-Up made their first contribution in #1334
- @hhr2449 made their first contribution in #1440
- @uncharted-G made their first contribution in #1448
- @qiuweit7 made their first contribution in #1365
Full Changelog: v0.3.8...v0.3.9