Skip to content

[TRTLLM-11851][feat] Add MX-only P2P checkpoint loading support for TRTLLM#13531

Open
chienchunhung wants to merge 1 commit intoNVIDIA:mainfrom
chienchunhung:trtllm-11851-mx-only
Open

[TRTLLM-11851][feat] Add MX-only P2P checkpoint loading support for TRTLLM#13531
chienchunhung wants to merge 1 commit intoNVIDIA:mainfrom
chienchunhung:trtllm-11851-mx-only

Conversation

@chienchunhung
Copy link
Copy Markdown
Collaborator

@chienchunhung chienchunhung commented Apr 27, 2026

Summary by CodeRabbit

Release Notes

  • New Features
    • Added ModelX (MX) integration for faster checkpoint loading via peer-to-peer weight transfer, with automatic fallback to standard disk-based loading when MX is unavailable or not configured.
    • New configuration parameters: mx_server_url for MX server endpoint and mx_preshard_strategy to control weight sharding behavior.

Description

Summary

This PR is the MX-only first slice split out from PR #13045.

It adds checkpoint_format="MX" support to TRT-LLM's PyTorch backend using upstream modelexpress.trtllm_live_transfer.MxLiveWeightLoader and publish_model_params. GMS is intentionally excluded so reviewers can validate MX first.

Follow-up PRs will add:

  1. GMS-only support (LoadFormat.GMS, GMSBackend, GMS args/tests)
  2. MX+GMS validation/composition
  3. Packaging extras once MX/GMS dependency onboarding is complete

What This PR Adds

MX Checkpoint Loader

Adds MXCheckpointLoader under tensorrt_llm/_torch/models/checkpoints/mx/.

Behavior:

  • Subclasses HfCheckpointLoader, so HF disk fallback is inherited.
  • Calls upstream MxLiveWeightLoader(mx_server=url).load_weights(checkpoint_dir, mapping=..., model=...).
  • Calls upstream publish_model_params(model) before post_load_weights() for source workers.
  • Exposes p2p_succeeded so ModelLoader can skip normal weight mapping on full P2P success.
  • Supports mixed P2P/disk fallback: if upstream returns fallback_weights, keep P2P-delivered tensors and merge only the returned fallback tensors through the standard disk pipeline.
  • Avoids republishing from P2P receiver workers to prevent duplicate MX metadata / NIXL registrations.

MX Config

Adds MX-only prototype fields:

  • mx_server_url
  • mx_server_query_timeout_s
  • mx_preshard_strategy

Behavior:

  • MODEL_EXPRESS_URL is used as fallback for mx_server_url when checkpoint_format="MX".
  • mx_server_query_timeout_s lets deployments size source-discovery wait time.
  • If unset, TRT-LLM probes MX first:
    • no registered source: use MX_SOURCE_QUERY_TIMEOUT=30 for fast disk fallback
    • registered source exists: defer to upstream/default wait so targets can wait for long source disk-loads
  • mx_preshard_strategy="global" fails fast until LoadFormat.PRESHARDED exists upstream.

ModelLoader Integration

Updates only the existing LoadFormat.AUTO path:

  • Passes model=model to checkpoint loaders. Generic HF loaders ignore it; MX uses it for direct P2P writes.
  • Always initializes self.weight_mapper, including the MX fast path, so reload() remains safe.
  • Marks non-draft Linear modules as _weights_presharded=True after MX success.
  • Skips normal weight mapping on full P2P success.
  • Runs returned fallback_weights through the standard weight-loading path for partial MX fallback.
  • Publishes source weights only when this worker did not receive via P2P.

Linear Marker

Adds _weights_presharded = False to Linear.

This PR does not route presharded tensors back through load_weight_shard() helpers; the marker is set after successful MX direct writes and is kept for the current ModelLoader path plus future LoadFormat.PRESHARDED work.

What This PR Excludes

This PR intentionally does not include:

  • LoadFormat.GMS
  • GMSBackend
  • gms_socket_path, gms_mode, gms_tag
  • GMS RW/RO loading
  • GMS tests
  • [gms] / [dynamo] packaging extras

Packaging Note

This PR does not add a [mx] extra yet.

For prototype testing:

pip install "modelexpress>=0.3.0,<0.4.0"

modelexpress is on PyPI but still needs NVIDIA OSS allowlist onboarding (tracked as MX-7). Once complete, restoring pip install tensorrt_llm[mx] is a small setup.py change.

Running MX

# config_mx.yaml
checkpoint_format: "MX"
mx_server_url: "http://mx-server:8001"
trtllm-serve <model> --config config_mx.yaml

Optional timeout override:

checkpoint_format: "MX"
mx_server_url: "http://mx-server:8001"
mx_server_query_timeout_s: 1800

Python API:

from tensorrt_llm import LLM

llm = LLM(
    model="<model>",
    checkpoint_format="MX",
    mx_server_url="http://mx-server:8001",
)

Test Coverage

Added MX-only unit tests:

  • tests/unittest/llmapi/test_mx_args.py
  • tests/unittest/_torch/models/checkpoints/mx/test_mx_checkpoint_loader.py
  • tests/unittest/_torch/pyexecutor/test_model_loader_mx.py

Coverage includes:

  • MX config defaults and validation
  • MODEL_EXPRESS_URL fallback
  • mx_server_query_timeout_s
  • mx_preshard_strategy validation
  • MX loader registry and construction
  • disk fallback paths
  • full P2P success path
  • partial fallback merge path
  • source publish env restoration
  • MODEL_NAME resolution
  • model-loader fast path: mapper init, reload safety, draft exclusion, publish skip on P2P receive

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@chienchunhung
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45846 [ run ] triggered by Bot. Commit: d6f0384 Link to invocation

@chienchunhung chienchunhung marked this pull request as ready for review April 28, 2026 20:11
@chienchunhung chienchunhung requested review from a team as code owners April 28, 2026 20:11
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 28, 2026

📝 Walkthrough

Walkthrough

This PR introduces MX (ModelExpress) peer-to-peer weight transfer support for checkpoint loading. A new MXCheckpointLoader performs direct model parameter writes when an MX server is configured, with fallback to disk loading. Changes include checkpoint loader registrations, configuration parameters, model loading integration, and comprehensive test coverage.

Changes

Cohort / File(s) Summary
Project Setup
setup.py
Adds documentation comment in extras_require about ModelX integration, noting the lack of a one-line extra and providing manual installation guidance.
Checkpoint Loader Base Infrastructure
tensorrt_llm/_torch/models/checkpoints/__init__.py, tensorrt_llm/_torch/models/checkpoints/base_weight_loader.py, tensorrt_llm/_torch/models/checkpoints/auto_mapper.py
Exports new MXCheckpointLoader class; extends BaseWeightLoader.load_weights() signature to accept **kwargs; adds MX format fallback logic to mapper auto-resolution that attempts {name}_HF key before generic format resolution.
HF Checkpoint Format Registration
tensorrt_llm/_torch/models/checkpoints/hf/config_loader.py, tensorrt_llm/_torch/models/checkpoints/hf/weight_loader.py, tensorrt_llm/_torch/models/checkpoints/hf/weight_mapper.py
Registers HfConfigLoader, HfWeightLoader, and HfWeightMapper for both "HF" and "MX" format keys; updates HfWeightLoader.load_weights() to accept and discard **kwargs for format-specific parameters.
MX Checkpoint Format Implementation
tensorrt_llm/_torch/models/checkpoints/mx/__init__.py, tensorrt_llm/_torch/models/checkpoints/mx/checkpoint_loader.py
Introduces new MXCheckpointLoader class with P2P weight transfer via modelexpress, fallback to disk loading, publishing support via publish_as_source, and environment variable handling; includes helper functions for model identity resolution and normalization.
Model Loading Integration
tensorrt_llm/_torch/pyexecutor/model_engine.py, tensorrt_llm/_torch/pyexecutor/model_loader.py, tensorrt_llm/_torch/pyexecutor/py_executor_creator.py, tensorrt_llm/executor/base_worker.py
Passes MX configuration (mx_server_url, mx_model_name) to checkpoint loader construction; integrates P2P loading by calling load_weights() with live model reference, skipping standard weight mapping on P2P success, marking main-model linears as _weights_presharded, and invoking publish callback before post_load_weights.
Linear Module
tensorrt_llm/_torch/modules/linear.py
Adds _weights_presharded attribute (default False) to track MX P2P pre-sharded weight delivery.
Configuration Parameters
tensorrt_llm/llmapi/llm_args.py
Introduces mx_server_url and mx_preshard_strategy fields to TorchLlmArgs; adds validate_mx_config validator that populates mx_server_url from MODEL_EXPRESS_URL environment variable when checkpoint format is "MX", warns on mismatched formats, and enforces allowed preshard strategy values.
Checkpoint Loader Tests
tests/unittest/_torch/models/checkpoints/mx/test_mx_checkpoint_loader.py
Comprehensive test suite for MXCheckpointLoader covering registry resolution, disk fallback scenarios (missing config, missing modelexpress import, upstream errors), P2P success paths (empty/non-empty return dicts), publish_as_source behavior, environment variable handling, model identity resolution, and timeout defaults.
Model Loader Integration Tests
tests/unittest/_torch/pyexecutor/test_model_loader_mx.py
Tests MX weight loading integration: validates P2P success path (skipped _call_load_weights, main-linear presharding, publish ordering) versus fallback path (standard loading, no presharding), and ensures draft-model linears are excluded from presharding.
Configuration Validation Tests
tests/unittest/llmapi/test_mx_args.py
Tests TorchLlmArgs MX field validation: default values, mx_preshard_strategy constraints, cross-field warnings, environment variable fallback for mx_server_url, and validator-time environment population.
API Stability Reference
tests/unittest/api_stability/references/llm.yaml
Adds mx_server_url and mx_preshard_strategy parameters to __init__ reference with prototype status.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client/ModelLoader
    participant Loader as MXCheckpointLoader
    participant MX as ModelExpress<br/>(P2P)
    participant HF as HuggingFace<br/>(Disk)
    participant Model as Model Instance

    Client->>Loader: load_weights(checkpoint_dir,<br/>mapping, model=...)
    
    alt MX Server & Model Reference Available
        Loader->>MX: MxLiveWeightLoader.transfer_weights()
        alt Transfer Success
            MX-->>Loader: weights_dict (empty or partial)
            Loader->>Model: Direct parameter writes<br/>(P2P succeeded = true)
            alt Partial Transfer (non-empty dict)
                Loader->>HF: Full disk load fallback<br/>(P2P succeeded = false)
                HF-->>Loader: complete weights
                Loader-->>Client: merged weights
            else Complete Transfer (empty dict)
                Loader-->>Client: P2P weights only
            end
        else Transfer Fails
            MX--XLoader: Exception
            Loader->>HF: Fallback to disk load<br/>(P2P succeeded = false)
            HF-->>Loader: weights from disk
            Loader-->>Client: disk weights
        end
    else Missing Config or modelexpress
        Loader->>HF: Fallback to disk load<br/>(P2P succeeded = false)
        HF-->>Loader: weights from disk
        Loader-->>Client: disk weights
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.38% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding MX-only P2P checkpoint loading support to TRT-LLM. It is concise, specific, and directly reflects the primary objective.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description is comprehensive and well-structured, addressing all key aspects of the changes including objectives, implementation details, exclusions, and usage examples.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tensorrt_llm/_torch/modules/linear.py (1)

1-1: ⚠️ Potential issue | 🟠 Major

Add required NVIDIA copyright/SPDX header to this modified Python source file.

This file was modified but still lacks the required header block at the top.

Proposed fix
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+
 from __future__ import annotations

As per coding guidelines, "All TensorRT-LLM source files must contain an NVIDIA copyright header with the year of latest meaningful modification" and "Include NVIDIA copyright header on all new files; update year on modified files".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/modules/linear.py` at line 1, Add the required NVIDIA
copyright/SPDX header block at the very top of
tensorrt_llm/_torch/modules/linear.py (before the existing "from __future__
import annotations" line); the header must include the NVIDIA copyright line
with the year of latest meaningful modification and the SPDX-License-Identifier
(e.g., SPDX-License-Identifier: Apache-2.0) as used across the repo so the file
complies with project coding guidelines.
setup.py (1)

1-1: ⚠️ Potential issue | 🟠 Major

Update SPDX copyright year for this modified file.

setup.py was changed in 2026, but the header still ends at 2025.

🔧 Proposed fix
-# SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2022-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

As per coding guidelines, “All TensorRT-LLM source files must contain an NVIDIA copyright header with the year of latest meaningful modification” and “update year on modified files.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@setup.py` at line 1, Update the SPDX copyright header line that currently
reads "SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION &
AFFILIATES. All rights reserved." to reflect the latest modification year
(2026); locate the header by the unique string "SPDX-FileCopyrightText" in
setup.py and change the year range to "2022-2026" (or to a single year "2026" if
preferred by project convention) so the file header matches the most recent
modification.
🧹 Nitpick comments (1)
tests/unittest/_torch/pyexecutor/test_model_loader_mx.py (1)

97-157: Add regressions for MX success + reload() and non-default preshard strategy.

These tests only exercise the "per_module" happy path. The production branch also owns self.weight_mapper setup and strategy-specific skip logic, so a pure MX load followed by reload() or mx_preshard_strategy="global" can regress without this suite noticing. QA list updates look unnecessary here because this is unit-only coverage.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/_torch/pyexecutor/test_model_loader_mx.py` around lines 97 -
157, Tests only cover the "per_module" MX preshard path and miss regressions for
a subsequent reload() call and for the "global" mx_preshard_strategy; extend
unit tests (e.g., add cases alongside
test_mx_success_marks_main_linears_and_skips_weight_mapping and
test_mx_fallback_runs_standard_weight_mapping) to simulate: (1) calling
loader.reload(...) after a successful MX load to ensure loader.weight_mapper and
skip logic still behave, and (2) running loader.load with
mx_preshard_strategy="global" (or by configuring loader.weight_mapper to use
global strategy) to assert preshard marking/skipping behaves as expected for
main modules vs draft_model modules; reuse _make_loader and checkpoint_loader
mocks (set checkpoint_loader.p2p_succeeded True/False and
checkpoint_loader.load_weights return values) and assert
loader._call_load_weights counts, model.*_weights_presharded flags, and event
order just like the existing tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/pyexecutor/model_loader.py`:
- Around line 421-445: When mx_p2p_succeeded is true the code marks Linear
modules presharded but never initializes self.weight_mapper nor validates
mx_preshard_strategy, yet reload() later expects self.weight_mapper; update the
mx_p2p_succeeded branch to always set self.weight_mapper via
checkpoint_loader.get_initialized_weight_mapper(model, config) (same as the
non-fast path) and validate config.mx_preshard_strategy (e.g., raise or handle
if it's not "per_module") before marking modules so non-"per_module" strategies
fail fast; keep using model.load_weights with self.weight_mapper so reload() can
safely consume it.

---

Outside diff comments:
In `@setup.py`:
- Line 1: Update the SPDX copyright header line that currently reads
"SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION &
AFFILIATES. All rights reserved." to reflect the latest modification year
(2026); locate the header by the unique string "SPDX-FileCopyrightText" in
setup.py and change the year range to "2022-2026" (or to a single year "2026" if
preferred by project convention) so the file header matches the most recent
modification.

In `@tensorrt_llm/_torch/modules/linear.py`:
- Line 1: Add the required NVIDIA copyright/SPDX header block at the very top of
tensorrt_llm/_torch/modules/linear.py (before the existing "from __future__
import annotations" line); the header must include the NVIDIA copyright line
with the year of latest meaningful modification and the SPDX-License-Identifier
(e.g., SPDX-License-Identifier: Apache-2.0) as used across the repo so the file
complies with project coding guidelines.

---

Nitpick comments:
In `@tests/unittest/_torch/pyexecutor/test_model_loader_mx.py`:
- Around line 97-157: Tests only cover the "per_module" MX preshard path and
miss regressions for a subsequent reload() call and for the "global"
mx_preshard_strategy; extend unit tests (e.g., add cases alongside
test_mx_success_marks_main_linears_and_skips_weight_mapping and
test_mx_fallback_runs_standard_weight_mapping) to simulate: (1) calling
loader.reload(...) after a successful MX load to ensure loader.weight_mapper and
skip logic still behave, and (2) running loader.load with
mx_preshard_strategy="global" (or by configuring loader.weight_mapper to use
global strategy) to assert preshard marking/skipping behaves as expected for
main modules vs draft_model modules; reuse _make_loader and checkpoint_loader
mocks (set checkpoint_loader.p2p_succeeded True/False and
checkpoint_loader.load_weights return values) and assert
loader._call_load_weights counts, model.*_weights_presharded flags, and event
order just like the existing tests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b240e53f-4bf3-42cf-965b-c5c28cfd7c1f

📥 Commits

Reviewing files that changed from the base of the PR and between 2b7871f and d6f0384.

📒 Files selected for processing (19)
  • setup.py
  • tensorrt_llm/_torch/models/checkpoints/__init__.py
  • tensorrt_llm/_torch/models/checkpoints/auto_mapper.py
  • tensorrt_llm/_torch/models/checkpoints/base_weight_loader.py
  • tensorrt_llm/_torch/models/checkpoints/hf/config_loader.py
  • tensorrt_llm/_torch/models/checkpoints/hf/weight_loader.py
  • tensorrt_llm/_torch/models/checkpoints/hf/weight_mapper.py
  • tensorrt_llm/_torch/models/checkpoints/mx/__init__.py
  • tensorrt_llm/_torch/models/checkpoints/mx/checkpoint_loader.py
  • tensorrt_llm/_torch/modules/linear.py
  • tensorrt_llm/_torch/pyexecutor/model_engine.py
  • tensorrt_llm/_torch/pyexecutor/model_loader.py
  • tensorrt_llm/_torch/pyexecutor/py_executor_creator.py
  • tensorrt_llm/executor/base_worker.py
  • tensorrt_llm/llmapi/llm_args.py
  • tests/unittest/_torch/models/checkpoints/mx/test_mx_checkpoint_loader.py
  • tests/unittest/_torch/pyexecutor/test_model_loader_mx.py
  • tests/unittest/api_stability/references/llm.yaml
  • tests/unittest/llmapi/test_mx_args.py

Comment thread tensorrt_llm/_torch/pyexecutor/model_loader.py Outdated
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45846 [ run ] completed with state ABORTED. Commit: d6f0384

Link to invocation

Comment thread tensorrt_llm/_torch/models/checkpoints/mx/checkpoint_loader.py
Comment thread tensorrt_llm/_torch/models/checkpoints/mx/checkpoint_loader.py
Comment thread tensorrt_llm/_torch/models/checkpoints/mx/checkpoint_loader.py Outdated
Comment thread tensorrt_llm/_torch/models/checkpoints/mx/checkpoint_loader.py
Comment thread tensorrt_llm/_torch/pyexecutor/model_loader.py Outdated
Comment thread tensorrt_llm/_torch/pyexecutor/model_loader.py Outdated
@KavinKrishnan
Copy link
Copy Markdown

PR LGTM - lack approval privileges

Copy link
Copy Markdown
Collaborator

@tburt-nv tburt-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem with the setup.py comments

Comment thread tensorrt_llm/_torch/models/checkpoints/hf/weight_loader.py Outdated
@chienchunhung chienchunhung requested a review from 2ez4bz May 1, 2026 00:03
Comment thread tensorrt_llm/llmapi/llm_args.py Outdated
Comment thread tensorrt_llm/_torch/pyexecutor/py_executor_creator.py Outdated
Comment thread tensorrt_llm/_torch/models/checkpoints/mx/checkpoint_loader.py
Comment thread tensorrt_llm/_torch/pyexecutor/model_loader.py Outdated
Comment thread tensorrt_llm/_torch/models/checkpoints/mx/checkpoint_loader.py
Comment thread tensorrt_llm/_torch/modules/linear.py
Comment thread tensorrt_llm/_torch/pyexecutor/model_loader.py Outdated
Comment thread tensorrt_llm/llmapi/llm_args.py Outdated
Comment thread tensorrt_llm/_torch/pyexecutor/model_loader.py Outdated
Comment thread tensorrt_llm/_torch/models/checkpoints/mx/checkpoint_loader.py
Copy link
Copy Markdown
Collaborator

@brb-nv brb-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments. Changes LGTM.

Comment thread tensorrt_llm/_torch/models/checkpoints/hf/weight_loader.py
Comment thread tensorrt_llm/_torch/pyexecutor/model_loader.py Outdated
Introduce the first PR slice from the MX/GMS prototype: checkpoint_format="MX" support using upstream modelexpress MxLiveWeightLoader and publish_model_params, while intentionally excluding GMS/load_format changes.

Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
Made-with: Cursor
Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
Made-with: Cursor
Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
@chienchunhung chienchunhung force-pushed the trtllm-11851-mx-only branch from 7cd32b2 to 49faefc Compare May 1, 2026 23:35
Copy link
Copy Markdown
Collaborator

@venkywonka venkywonka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR seem to have no documentation update on this awesome new feature addition. If that is aimed at a follow-up PR then no worries, but if not, I'd recommend adding some docs:

Here are some places codex suggested:

  • docs/source/features/checkpoint-loading.md
  • Add a dedicated features/model-express-p2p- checkpoint-loading.md, wire it into docs/source/index.rst
  • Add small pointers/examples in overview.md, trtllm-serve.rst,
    and optionally quickstart_advanced.py

@venkywonka
Copy link
Copy Markdown
Collaborator

also if you desire that this be tracked in telemetry as a feature usage, you might also want to update _collect_features() in tensorrt_llm/usage/usage_lib.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants