Release v0.3.9 · kvcache-ai/Mooncake

What's Changed

[Store] fix: switch offset allocator node storage to vector by @yuechen-sys in #1286
[Store] feat:mooncake store enable ascend fabric mem mode by @ascend-direct-dev in #1170
[Doc] Fix transfer engine doc image paths by @00fish0 in #1295
Update yalantinglibs dependency to use direct archive download by @Copilot in #1282
[store] support batch pub and tp aware for pub_tensor by @zxpdemonio in #1288
doc: add dummy client support for SGLang hicache integration by @YiXR in #1299
Add YiXR as code owner for mooncake-store by @stmatengss in #1301
doc: fix wrong env by @YiXR in #1302
[store] add error checking for get_tensor_into by @zxpdemonio in #1272
[Store] add HugePage support by @YiXR in #1300
[TENT] Add backward compatibility with TE by @alogfans in #1277
[Store]: Add OffsetAllocator disk backend with lock-striped metadata + refcounted extents. by @maheshrbapatu in #1284
[EP] Use NVLink in EP if possible by @UNIDY2002 in #1308
[EP] Replace _mm_pause() with PAUSE() macro by @UNIDY2002 in #1313
[TE] Add Task Completion Latency Tracking and Detailed Metrics Reporting by @staryxchen in #1310
feat: Add code coverage support in CI by @LiYiMing-lg in #1316
docs: add Docker deployment instructions to Chinese build guide by @LiYiMing-lg in #1318
[TE] Add early mem backend detection method in NVLINK_allocator by @TTThanos in #1296
[TE]feat: ascend direct transport add async tranfer task limit by @ascend-direct-dev in #1325
[TENT] fix a bug in memory registration and add TENT support to the example by @staryxchen in #1330
[Store][Feature]: add task manager component by @zhongzhouTan-coder in #1326
[Store] - Implement partial success handling in BatchOffload by @maheshrbapatu in #1319
[TE] support arm arch PAUSE() by @stmatengss in #1340
[CI] add cuda13 wheel release workflow by @stmatengss in #1331
[TENT] Add Redis Authentication and Database Selection Support by @staryxchen in #1339
[TE] Revert nvlink_transport.cpp to previous version by @TTThanos in #1342
Bump version to 0.3.8.post1 in pyproject.toml by @ShangmingCai in #1348
[CI] Fix torch index by @UNIDY2002 in #1349
[TE/HIP] Switch to IPC mode by default and enable P2P access by @amd-arozanov in #1344
[Doc] Add goverance doc by @ykwd in #1352
[CI] enable sglang-integration tests for forked repos with pull_reque… by @Ann-1024 in #1351
[Build] auto add commit id to pyproject toml by @stmatengss in #1345
[Store][Feature]: add copy/move execution api support in master side by @zhongzhouTan-coder in #1327
[Store] Add Prometheus and Grafana example by @stmatengss in #1335
[Store]: Use SharedMutex in MetadataShard for better performance by @nickyc975 in #1343
[TENT] Add Metrics System with HTTP Server and Prometheus Integration by @staryxchen in #1355
[Misc] Add amd-arozanov as CODEOWNERS for TE hip_transport by @ykwd in #1354
[TE/HIP] Add stream and event pools for async transfer operations by @amd-arozanov in #1353
[CI]optimize clang-format workflow to check only changed files by @staryxchen in #1359
[CI] fix integration test on push and validate download error by @Ann-1024 in #1361
[Store] Add retry logic for auto port binding in client setup by @chenkaiyue in #1328
Add Tensor-Centric Ecosystem component to README by @Copilot in #1358
[Doc] add docs of Mooncake EPD integration with SGLang by @liusy58 in #1262
[Store][FIX] fix issue that the preferred segments not working when put by @zhongzhouTan-coder in #1360
[TE] Add support for local communication resource configuration via environment variable by @Cheng-China in #1366
PR3 coro_rpc_communicator python bindings by @JasonZhang517 in #1106
[CI] Improve Code Formatting Workflow by @staryxchen in #1368
[CI] Fix CI Failure by @ykwd in #1367
[Doc] Update News by @ykwd in #1383
[TE]feat(topology): add IB device availability filtering by @luketong777 in #1375
[CI] add cuda13 CI test by @stmatengss in #1380
Mooncake PG by @ympcMark in #1387
[CI]Add testcase test_disaggregation_different_tp and fix potential network issues by @luketong777 in #1386
[TE]fix: correct RDMA device path mapping by @luketong777 in #1393
[Doc] Add batch API docs and examples for transfer engine by @chestnut-Q in #1395
[TENT] Fix crash problem when device is down by @alogfans in #1333
[CI]fix(ci): only count timeout while job is running (skip pending) and filter artifacts by excluding cu130 builds by @luketong777 in #1394
Mooncake PG buffer by @ympcMark in #1401
add troubleshooting of exceeding ulimit by @zhangzuo21 in #1405
[PG] Always send pre-flight requests during backend init by @UNIDY2002 in #1409
[Build] Retrieve the actual glibc version during the build process by @zxpdemonio in #1402
chore: update clang-format to v20.1.8 and enforce version 20 by @staryxchen in #1379
feat: add get_engine_ptr method for easily send ptr to another c++ extensions from python by @weixiao-huang in #1411
[CI]Set DEFAULT_MODEL_NAME_FOR_TEST to meta-llama/Llama-3.2-3B-Instruct for test_disaggregation_different_tp by @luketong777 in #1412
[Wheel] Remove the default buffer pre-allocation in initialize() by @chestnut-Q in #1415
[TE] Fix SIGSEGV in Session::writeBody/readBody logging. by @caozhanhao in #1418
[performance] decrease regmr overhead in ep_buffer/gda path by @Bruce-x-1997 in #1414
[DOC] Add update for Mooncake Project approval by @stmatengss in #1419
[TE] Nvlink intraNode Transport isolation by @TTThanos in #1341
[Store] - Add comprehensive storage backend benchmark suite by @maheshrbapatu in #1388
[CI] Support torch==2.10.0 by @UNIDY2002 in #1420
[TE] Add configurable handshake max length via MC_HANDSHAKE_MAX_LENGTH by @herrluk in #1392
[Store] Support batch query keys api for master service http server by @s5u13b in #1417
feat(Store): Support get local ssd object by @zhuxinjie-nz in #1203
Intra-Node NVLink related Docs modification by @TTThanos in #1424
[Store] unregister client local buffer when tear down by @hjchen2 in #1413
[Store] feat: TENT/Store integration improvements by @00fish0 in #1398
[TE] Fix race condition in receivePeerMetadata and getSegmentDesc by @staryxchen in #1373
[TE] fix wakeup race condition in RDMA WorkerPool by @chestnut-Q in #1434
fix(ep_buffer): rm the fixed id 3 for rdma device by @KMSorSMS in #1432
[TE] Enable ubshmem transport via ascend vmm apis by @VNightMare in #1399
[Store] validate metadata when put_tensor by @zxpdemonio in #1396
[Store] feat: wait Master ready when starting Store server by @acelyc111 in #1438
[Store][Feature]task add rpc_only protocol by @LiuYi-Up in #1334
Document all supported communication protocols by @Copilot in #1435
[PG] Support full reduction ops (Product/Min/Max) and fix reduce kernel indexing bug by @hhr2449 in #1440
[Store] fix malloc physical for ascend fabric mem by @ascend-direct-dev in #1427
[PG] Quick fix for reduceCpu by @UNIDY2002 in #1444
[CI]fix: update router service check, test env, and docker image pulling by @luketong777 in #1446
[TENT] feat(RDMA): add IB device availability checks and improve GID selection by @00fish0 in #1397
Pass a dict to setup api to reduce api changes. by @maobaolong in #1445
[Doc] Enhance update on Mooncake Project approval by @stmatengss in #1449
[TE] Support transfer in cuda stream via cudaLaunchHostFunc by @uncharted-G in #1448
[Doc] add FlexKV project update to README by @staryxchen in #1452
[Store][Feature] Add CXL storage for mooncake_store by @qiuweit7 in #1365
add force flag for remove by @XucSh in #1425
docs: fix typo in README.md by @staryxchen in #1453
[Test] Update YALANTINGLIBS_VERSION to 0.5.7 by @stmatengss in #1457
[Docs] Sync recent news to docs/index.md by @ykwd in #1461
[ep] Avoid mooncake ep test crash when ibgda_init fails by @zxpdemonio in #1410
[Bugfix] sync vllm mooncake connector from main repo by @dtcccc in #1466
[PG] Impelemented support for additional collective primitives: gather, scatter, and reduce in the Mooncake backend and Added unit tests in test_mooncake_backend.py and test_mooncake_backend_cpu to cover new ops by @hhr2449 in #1469
[EP] Clear sticky CUDA error on EP buffer re-init by @UNIDY2002 in #1470
[Docs] Update Readme by @ykwd in #1472
[TE] add GlobalResourceConfig config for ascend direct transport by @ascend-direct-dev in #1464
[TE] Backport GDS xport to mainstream by @alogfans in #1430
[Doc] improve Transfer Engine C++ API Reference by @00fish0 in #1467
[TENT] Add RDMA-based notification by @alogfans in #1460
Add ascend-direct-dev as codeowner by @alogfans in #1473
CI CI: testcases can run in a docker by @luketong777 in #1454
[TENT] fix: support loading config from file and fix nested JSON path lookup by @00fish0 in #1476
feat: use setuptools_scm for more elegant version and support MOONCAKE_LOCAL_VERSION env by @weixiao-huang in #1479
[TENT] Fix: sync metadata after unregistering local memory by @00fish0 in #1485
[TE] fix use short connect bug: disconnect happen before mark success by @ascend-direct-dev in #1481
[CI] Add new labels for PyTorch Backend and Mooncake EP by @stmatengss in #1487
[EP] enlarge kNumMaxTopK and support more hidden size by @UNIDY2002 in #1492
[Bug Fix]Fix segmentation fault when using python wrapped transfer engine and store in the same process. by @zhangzuo21 in #1471
[CI]Add vllm 1p1d test case by @luketong777 in #1491
[Docs]: Add TENT C++ API reference by @00fish0 in #1488
[TE] Add auto-connect feature controlled by config and environment va… by @Cheng-China in #1482
Update PyTorch installation URL for CUDA 13 by @ShangmingCai in #1495
Add TODO for more cu13 version support in build script by @ShangmingCai in #1496
Revert (#1479) since it breaks the release workflow by @ShangmingCai in #1497
Bump version to 0.3.9 in pyproject.toml by @ShangmingCai in #1498
Revert "[Build] auto add commit id to pyproject toml" by @ShangmingCai in #1501
[PG] Fix incorrect calculation of task_id during transfer status fetching by @UNIDY2002 in #1474
[CI] Modifying package name for CUDA-13 build by @UNIDY2002 in #1502

New Contributors

@yuechen-sys made their first contribution in #1286
@LiYiMing-lg made their first contribution in #1316
@chenkaiyue made their first contribution in #1328
@Cheng-China made their first contribution in #1366
@weixiao-huang made their first contribution in #1411
@caozhanhao made their first contribution in #1418
@Bruce-x-1997 made their first contribution in #1414
@herrluk made their first contribution in #1392
@s5u13b made their first contribution in #1417
@KMSorSMS made their first contribution in #1432
@VNightMare made their first contribution in #1399
@acelyc111 made their first contribution in #1438
@LiuYi-Up made their first contribution in #1334
@hhr2449 made their first contribution in #1440
@uncharted-G made their first contribution in #1448
@qiuweit7 made their first contribution in #1365

Full Changelog: v0.3.8...v0.3.9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3.9

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

New Contributors

Contributors

Uh oh!