Skip to content

v0.3.9

Latest

Choose a tag to compare

@ShangmingCai ShangmingCai released this 05 Feb 10:50
· 6 commits to main since this release
a00f757

What's Changed

  • [Store] fix: switch offset allocator node storage to vector by @yuechen-sys in #1286
  • [Store] feat:mooncake store enable ascend fabric mem mode by @ascend-direct-dev in #1170
  • [Doc] Fix transfer engine doc image paths by @00fish0 in #1295
  • Update yalantinglibs dependency to use direct archive download by @Copilot in #1282
  • [store] support batch pub and tp aware for pub_tensor by @zxpdemonio in #1288
  • doc: add dummy client support for SGLang hicache integration by @YiXR in #1299
  • Add YiXR as code owner for mooncake-store by @stmatengss in #1301
  • doc: fix wrong env by @YiXR in #1302
  • [store] add error checking for get_tensor_into by @zxpdemonio in #1272
  • [Store] add HugePage support by @YiXR in #1300
  • [TENT] Add backward compatibility with TE by @alogfans in #1277
  • [Store]: Add OffsetAllocator disk backend with lock-striped metadata + refcounted extents. by @maheshrbapatu in #1284
  • [EP] Use NVLink in EP if possible by @UNIDY2002 in #1308
  • [EP] Replace _mm_pause() with PAUSE() macro by @UNIDY2002 in #1313
  • [TE] Add Task Completion Latency Tracking and Detailed Metrics Reporting by @staryxchen in #1310
  • feat: Add code coverage support in CI by @LiYiMing-lg in #1316
  • docs: add Docker deployment instructions to Chinese build guide by @LiYiMing-lg in #1318
  • [TE] Add early mem backend detection method in NVLINK_allocator by @TTThanos in #1296
  • [TE]feat: ascend direct transport add async tranfer task limit by @ascend-direct-dev in #1325
  • [TENT] fix a bug in memory registration and add TENT support to the example by @staryxchen in #1330
  • [Store][Feature]: add task manager component by @zhongzhouTan-coder in #1326
  • [Store] - Implement partial success handling in BatchOffload by @maheshrbapatu in #1319
  • [TE] support arm arch PAUSE() by @stmatengss in #1340
  • [CI] add cuda13 wheel release workflow by @stmatengss in #1331
  • [TENT] Add Redis Authentication and Database Selection Support by @staryxchen in #1339
  • [TE] Revert nvlink_transport.cpp to previous version by @TTThanos in #1342
  • Bump version to 0.3.8.post1 in pyproject.toml by @ShangmingCai in #1348
  • [CI] Fix torch index by @UNIDY2002 in #1349
  • [TE/HIP] Switch to IPC mode by default and enable P2P access by @amd-arozanov in #1344
  • [Doc] Add goverance doc by @ykwd in #1352
  • [CI] enable sglang-integration tests for forked repos with pull_reque… by @Ann-1024 in #1351
  • [Build] auto add commit id to pyproject toml by @stmatengss in #1345
  • [Store][Feature]: add copy/move execution api support in master side by @zhongzhouTan-coder in #1327
  • [Store] Add Prometheus and Grafana example by @stmatengss in #1335
  • [Store]: Use SharedMutex in MetadataShard for better performance by @nickyc975 in #1343
  • [TENT] Add Metrics System with HTTP Server and Prometheus Integration by @staryxchen in #1355
  • [Misc] Add amd-arozanov as CODEOWNERS for TE hip_transport by @ykwd in #1354
  • [TE/HIP] Add stream and event pools for async transfer operations by @amd-arozanov in #1353
  • [CI]optimize clang-format workflow to check only changed files by @staryxchen in #1359
  • [CI] fix integration test on push and validate download error by @Ann-1024 in #1361
  • [Store] Add retry logic for auto port binding in client setup by @chenkaiyue in #1328
  • Add Tensor-Centric Ecosystem component to README by @Copilot in #1358
  • [Doc] add docs of Mooncake EPD integration with SGLang by @liusy58 in #1262
  • [Store][FIX] fix issue that the preferred segments not working when put by @zhongzhouTan-coder in #1360
  • [TE] Add support for local communication resource configuration via environment variable by @Cheng-China in #1366
  • PR3 coro_rpc_communicator python bindings by @JasonZhang517 in #1106
  • [CI] Improve Code Formatting Workflow by @staryxchen in #1368
  • [CI] Fix CI Failure by @ykwd in #1367
  • [Doc] Update News by @ykwd in #1383
  • [TE]feat(topology): add IB device availability filtering by @luketong777 in #1375
  • [CI] add cuda13 CI test by @stmatengss in #1380
  • Mooncake PG by @ympcMark in #1387
  • [CI]Add testcase test_disaggregation_different_tp and fix potential network issues by @luketong777 in #1386
  • [TE]fix: correct RDMA device path mapping by @luketong777 in #1393
  • [Doc] Add batch API docs and examples for transfer engine by @chestnut-Q in #1395
  • [TENT] Fix crash problem when device is down by @alogfans in #1333
  • [CI]fix(ci): only count timeout while job is running (skip pending) and filter artifacts by excluding cu130 builds by @luketong777 in #1394
  • Mooncake PG buffer by @ympcMark in #1401
  • add troubleshooting of exceeding ulimit by @zhangzuo21 in #1405
  • [PG] Always send pre-flight requests during backend init by @UNIDY2002 in #1409
  • [Build] Retrieve the actual glibc version during the build process by @zxpdemonio in #1402
  • chore: update clang-format to v20.1.8 and enforce version 20 by @staryxchen in #1379
  • feat: add get_engine_ptr method for easily send ptr to another c++ extensions from python by @weixiao-huang in #1411
  • [CI]Set DEFAULT_MODEL_NAME_FOR_TEST to meta-llama/Llama-3.2-3B-Instruct for test_disaggregation_different_tp by @luketong777 in #1412
  • [Wheel] Remove the default buffer pre-allocation in initialize() by @chestnut-Q in #1415
  • [TE] Fix SIGSEGV in Session::writeBody/readBody logging. by @caozhanhao in #1418
  • [performance] decrease regmr overhead in ep_buffer/gda path by @Bruce-x-1997 in #1414
  • [DOC] Add update for Mooncake Project approval by @stmatengss in #1419
  • [TE] Nvlink intraNode Transport isolation by @TTThanos in #1341
  • [Store] - Add comprehensive storage backend benchmark suite by @maheshrbapatu in #1388
  • [CI] Support torch==2.10.0 by @UNIDY2002 in #1420
  • [TE] Add configurable handshake max length via MC_HANDSHAKE_MAX_LENGTH by @herrluk in #1392
  • [Store] Support batch query keys api for master service http server by @s5u13b in #1417
  • feat(Store): Support get local ssd object by @zhuxinjie-nz in #1203
  • Intra-Node NVLink related Docs modification by @TTThanos in #1424
  • [Store] unregister client local buffer when tear down by @hjchen2 in #1413
  • [Store] feat: TENT/Store integration improvements by @00fish0 in #1398
  • [TE] Fix race condition in receivePeerMetadata and getSegmentDesc by @staryxchen in #1373
  • [TE] fix wakeup race condition in RDMA WorkerPool by @chestnut-Q in #1434
  • fix(ep_buffer): rm the fixed id 3 for rdma device by @KMSorSMS in #1432
  • [TE] Enable ubshmem transport via ascend vmm apis by @VNightMare in #1399
  • [Store] validate metadata when put_tensor by @zxpdemonio in #1396
  • [Store] feat: wait Master ready when starting Store server by @acelyc111 in #1438
  • [Store][Feature]task add rpc_only protocol by @LiuYi-Up in #1334
  • Document all supported communication protocols by @Copilot in #1435
  • [PG] Support full reduction ops (Product/Min/Max) and fix reduce kernel indexing bug by @hhr2449 in #1440
  • [Store] fix malloc physical for ascend fabric mem by @ascend-direct-dev in #1427
  • [PG] Quick fix for reduceCpu by @UNIDY2002 in #1444
  • [CI]fix: update router service check, test env, and docker image pulling by @luketong777 in #1446
  • [TENT] feat(RDMA): add IB device availability checks and improve GID selection by @00fish0 in #1397
  • Pass a dict to setup api to reduce api changes. by @maobaolong in #1445
  • [Doc] Enhance update on Mooncake Project approval by @stmatengss in #1449
  • [TE] Support transfer in cuda stream via cudaLaunchHostFunc by @uncharted-G in #1448
  • [Doc] add FlexKV project update to README by @staryxchen in #1452
  • [Store][Feature] Add CXL storage for mooncake_store by @qiuweit7 in #1365
  • add force flag for remove by @XucSh in #1425
  • docs: fix typo in README.md by @staryxchen in #1453
  • [Test] Update YALANTINGLIBS_VERSION to 0.5.7 by @stmatengss in #1457
  • [Docs] Sync recent news to docs/index.md by @ykwd in #1461
  • [ep] Avoid mooncake ep test crash when ibgda_init fails by @zxpdemonio in #1410
  • [Bugfix] sync vllm mooncake connector from main repo by @dtcccc in #1466
  • [PG] Impelemented support for additional collective primitives: gather, scatter, and reduce in the Mooncake backend and Added unit tests in test_mooncake_backend.py and test_mooncake_backend_cpu to cover new ops by @hhr2449 in #1469
  • [EP] Clear sticky CUDA error on EP buffer re-init by @UNIDY2002 in #1470
  • [Docs] Update Readme by @ykwd in #1472
  • [TE] add GlobalResourceConfig config for ascend direct transport by @ascend-direct-dev in #1464
  • [TE] Backport GDS xport to mainstream by @alogfans in #1430
  • [Doc] improve Transfer Engine C++ API Reference by @00fish0 in #1467
  • [TENT] Add RDMA-based notification by @alogfans in #1460
  • Add ascend-direct-dev as codeowner by @alogfans in #1473
  • CI CI: testcases can run in a docker by @luketong777 in #1454
  • [TENT] fix: support loading config from file and fix nested JSON path lookup by @00fish0 in #1476
  • feat: use setuptools_scm for more elegant version and support MOONCAKE_LOCAL_VERSION env by @weixiao-huang in #1479
  • [TENT] Fix: sync metadata after unregistering local memory by @00fish0 in #1485
  • [TE] fix use short connect bug: disconnect happen before mark success by @ascend-direct-dev in #1481
  • [CI] Add new labels for PyTorch Backend and Mooncake EP by @stmatengss in #1487
  • [EP] enlarge kNumMaxTopK and support more hidden size by @UNIDY2002 in #1492
  • [Bug Fix]Fix segmentation fault when using python wrapped transfer engine and store in the same process. by @zhangzuo21 in #1471
  • [CI]Add vllm 1p1d test case by @luketong777 in #1491
  • [Docs]: Add TENT C++ API reference by @00fish0 in #1488
  • [TE] Add auto-connect feature controlled by config and environment va… by @Cheng-China in #1482
  • Update PyTorch installation URL for CUDA 13 by @ShangmingCai in #1495
  • Add TODO for more cu13 version support in build script by @ShangmingCai in #1496
  • Revert (#1479) since it breaks the release workflow by @ShangmingCai in #1497
  • Bump version to 0.3.9 in pyproject.toml by @ShangmingCai in #1498
  • Revert "[Build] auto add commit id to pyproject toml" by @ShangmingCai in #1501
  • [PG] Fix incorrect calculation of task_id during transfer status fetching by @UNIDY2002 in #1474
  • [CI] Modifying package name for CUDA-13 build by @UNIDY2002 in #1502

New Contributors

Full Changelog: v0.3.8...v0.3.9