[https://nvbugs/5926823][fix] Propagate logprobs from prefill to decode in disagg by brb-nv · Pull Request #11727 · NVIDIA/TensorRT-LLM

brb-nv · 2026-02-25T22:45:33Z

Description

This is a proposed fix for these bugs:
https://nvbugspro.nvidia.com/bug/5926823
https://nvbugspro.nvidia.com/bug/5926799

While both manifest differently, the core problem is that we don't transfer logprobs from prefill->decode in disagg.

Question for reviewers:

Should we handle the case gracefully where logprobs are requested only on decode server but not on prefill? Or should we just throw saying logprobs should be requested on both prefill and decode?
Any concerns about latency or size of the logprob payload? Since, logprobs seems to be top_k (unlike logits), I believe this should be ok?

Test Coverage

$ pytest tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py::test_disaggregated_logprobs -s -v
$ pytest tests/unittest/disaggregated/test_openai_disagg_service.py::TestFirstGenLogProbsSerializeRoundtrip -s -v

Repro script passes.

# 1. Kill any existing servers and make sure netstat returns empty.
pkill -f "cum_log_probs_repro.py" 2>/dev/null; pkill -f "trtllm-serve" 2>/dev/null; pkill -f "uvicorn" 2>/dev/null;
sleep 2
netstat -tlnp 2>/dev/null | grep 8000

# 2. Start prefill server on GPU 0 (logs to file).
cd /home/bbuddharaju/scratch/TensorRT-LLM
CUDA_VISIBLE_DEVICES=0 python3 cum_log_probs_repro.py prefill > output_prefill.log 2>&1 &

# 3. Wait for prefill server to be healthy.
echo "Waiting for prefill server..."
until curl -sf http://localhost:8000/health > /dev/null 2>&1; do sleep 5; done
echo "Prefill server ready!"

# 4. Run tests on GPU 1.
CUDA_VISIBLE_DEVICES=1 python3 cum_log_probs_repro.py test

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/developer-guide/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Summary by CodeRabbit

Release Notes

New Features
- Added support for cumulative log probabilities in disaggregated (separate prefill/decode) serving mode.
- Log probabilities can now be transferred from prefill to decode stages.
Bug Fixes
- Improved robustness of log probability attribute access with explicit guard checks to prevent errors.
Tests
- Added comprehensive tests for log probability workflow in disaggregated serving scenarios.

coderabbitai · 2026-02-25T22:54:11Z

📝 Walkthrough

Walkthrough

A new disaggregated logprobs workflow is introduced with a reproduction module, guard conditions to safely access logprob attributes, and serialization support across the TensorRT-LLM stack. Changes include a prefill HTTP server, decode engine, logprob data threading through disaggregated parameters, and OpenAI protocol integration.

Changes

Cohort / File(s)	Summary
New Logprobs Reproduction Workflow `cum_log_probs_repro.py`	Introduces PrefillEngine with in-process FastAPI server and Engine dataclass for disaggregated decoding. Implements context_only prefill path, remote prefill HTTP integration, local and remote decode paths with streaming, logprob serialization, and pytest-based validation tests. Includes CLI for prefill server mode or test execution.
Safe Logprob Attribute Access `tensorrt_llm/_torch/pyexecutor/llm_request.py`	Adds explicit guard conditions to `log_probs` and `cum_log_probs` properties to check for attribute existence before access, preventing AttributeError when `_log_probs` is partially initialized.
Disaggregated Decode Logprob Threading `tensorrt_llm/_torch/pyexecutor/py_executor.py`, `tensorrt_llm/executor/result.py`	Py_executor conditionally appends first_gen_log_probs during disaggregated transmission. Result.py relaxes finish reason guards, adds generation_only mode handling with logprob assertions, and propagates context_only logprobs into disaggregated_params for downstream use.
Logprob Data Structure Support `tensorrt_llm/disaggregated_params.py`	Adds `first_gen_log_probs: Optional[List]` field to DisaggregatedParams to carry logprobs from prefill through the disaggregated decode workflow.
OpenAI Protocol Serialization `tensorrt_llm/serve/openai_protocol.py`	Adds `_serialize_first_gen_log_probs` and `_deserialize_first_gen_log_probs` helpers to convert logprobs between Logprob objects and JSON-safe formats. Updates `to_disaggregated_params` and `to_llm_disaggregated_params` for bidirectional transformation.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant PrefillServer as Prefill HTTP Server
    participant DecodeEngine as Decode Engine
    participant LLM as TRT-LLM Executor

    Client->>PrefillServer: POST /prefill (prompt, max_tokens, logprobs)
    PrefillServer->>LLM: context_only generation
    LLM-->>PrefillServer: first_gen_tokens, first_gen_log_probs, opaque_state
    PrefillServer-->>Client: PrefillResponse (tokens + logprobs + state)

    Client->>DecodeEngine: generate_async with remote_prefill=true
    DecodeEngine->>DecodeEngine: _remote_prefill (deserialize opaque_state)
    DecodeEngine->>LLM: streaming generation with DisaggregatedParams
    LLM-->>DecodeEngine: stream chunks with logprobs
    DecodeEngine-->>Client: streamed output with first_gen_log_probs appended

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 19.23% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: propagating logprobs from prefill to decode in disaggregated mode. It is concise and directly relates to the bug fixes being addressed.
Description check	✅ Passed	PR description explains the core issue and references specific bug reports, includes test coverage details and a repro script walkthrough, and addresses reviewer questions about design decisions.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (2)

tensorrt_llm/disaggregated_params.py (1)

32-33: Define a concrete contract type for first_gen_log_probs.

Optional[List] is too loose for a field that crosses prefill/decode and protocol boundaries. Please use an explicit nested type alias (beam/token shape) to avoid downstream wrapping/shape ambiguity.

Also applies to: 42-42

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/disaggregated_params.py` around lines 32 - 33, Define explicit
nested type aliases and use them to annotate first_gen_log_probs instead of
Optional[List]; for example create aliases like TokenLogprobs = Dict[int, float]
(token id -> logprob), BeamLogprobs = List[TokenLogprobs] (one list per beam),
and FirstGenLogProbs = Optional[List[BeamLogprobs]] (or Optional[BeamLogprobs]
if outer list is redundant), then replace the loose Optional[List] annotations
for the first_gen_log_probs field(s) with FirstGenLogProbs; update both
occurrences of first_gen_log_probs in this module so downstream code has an
unambiguous beam/token shape contract.

cum_log_probs_repro.py (1)

64-67: Make the model path configurable instead of hardcoding a local absolute path.

Hardcoding /home/... makes the repro brittle for reviewers and CI environments.

💡 Suggested fix

 import argparse
 import base64
+import os
 import threading
@@
 CACHE_TRANSCEIVER = {"backend": "UCX", "max_tokens_in_buffer": 2048}
+DEFAULT_MODEL_PATH = "/home/scratch.bbuddharaju_gpu/random/hf_models/TinyLlama-1.1B-Chat-v1.0"
+MODEL_PATH = os.getenv("TRTLLM_MODEL_PATH", DEFAULT_MODEL_PATH)
@@
         self.llm = LLM(
-            model="/home/scratch.bbuddharaju_gpu/random/hf_models/TinyLlama-1.1B-Chat-v1.0",
+            model=MODEL_PATH,
             disable_overlap_scheduler=True,
             cache_transceiver_config=CACHE_TRANSCEIVER,
         )
@@
-        self.llm = LLM(model="/home/scratch.bbuddharaju_gpu/random/hf_models/TinyLlama-1.1B-Chat-v1.0", cache_transceiver_config=CACHE_TRANSCEIVER)
+        self.llm = LLM(model=MODEL_PATH, cache_transceiver_config=CACHE_TRANSCEIVER)

Also applies to: 135-135

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cum_log_probs_repro.py` around lines 64 - 67, Replace the hardcoded absolute
model path used as the model= argument with a configurable value: read it from
an environment variable (e.g. os.environ.get('MODEL_PATH')) or add a CLI/config
parameter and fall back to a sensible default; update both places where
model="/home/..." is passed (the model= keyword in the object/constructor calls
around the current occurrences) so tests/CI/reviewers can override the path
without editing the file.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cum_log_probs_repro.py`:
- Around line 51-52: The code currently flattens the nested token-position/top-k
structure when handling first_gen_log_probs, which loses the original grouping
when logprobs > 1; update the serialization and deserialization logic that
reads/writes first_gen_log_probs (and the similar blocks around the other
occurrences) to preserve the nested shape — keep a list of per-token objects
each containing the top-k list of SerializedLogprob entries instead of merging
them into a flat map; specifically, change the routines that iterate over
first_gen_log_probs to serialize each token's top-k array as-is and to
reconstruct the exact token-position → top-k list structure on load (ensure
types remain list[SerializedLogprob] | None and adjust any flattening helper
functions to operate on the nested lists rather than concatenating entries).
- Around line 1-8: Add the required NVIDIA Apache-2.0 copyright header to the
top of the new source file cum_log_probs_repro.py: insert the standard NVIDIA
copyright block (including the correct year and "NVIDIA CORPORATION &
AFFILIATES"), the Apache License, Version 2.0 notice and URL, and the short
license disclaimer sentence before any module docstring or code so the file
begins with the full header followed by the existing module docstring.
- Around line 79-82: The /prefill handler can run before the engine/LLM is
initialized; update the route in prefill (the async def prefill(req:
PrefillRequest) -> PrefillResponse) to guard against early requests by checking
a readiness condition (e.g., self.llm is not None or a boolean like
self.initialized or a threading.Event) and if not ready raise an HTTP 503
(HTTPException(status_code=503, detail="Engine not ready")) or return an
appropriate error response; only call self._generate_local_prefill(req) when the
readiness check passes. Ensure the readiness flag is set when start() finishes
assigning self.llm.

In `@tensorrt_llm/_torch/pyexecutor/py_executor.py`:
- Around line 2744-2750: The code assumes transferred first_gen_log_probs is
single-beam by always wrapping it in a list before calling
req.py_result.append_log_probs; instead, inspect
disagg_params.first_gen_log_probs shape/ndim (or a beam-size marker) and only
wrap when it's a 1D/single-beam vector, pass it through unchanged when it's
already a 2D/beam-aware array, and otherwise raise a clear error to fail fast;
update the logic around disagg_params and first_gen_log_probs used before
calling req.py_result.append_log_probs to perform this shape check and
conditional wrapping (or explicit rejection) so dimensions stay consistent with
beam-aware payloads.

In `@tensorrt_llm/executor/result.py`:
- Around line 476-483: The current code copies first_gen_log_probs from only
self._outputs[0], which loses per-sequence data in beam/best_of flows; update
the block in the class/method that uses context_phase_params and
self._disaggregated_params so that you collect the first-generation token
logprobs for each produced sequence by iterating over self._outputs and
extracting the first token's logprobs from each output that has logprobs (e.g.,
build a list like [out.logprobs[0] for out in self._outputs if out.logprobs])
and assign that list to self._disaggregated_params.first_gen_log_probs, ensuring
the resulting list length matches the number of produced sequences.
- Around line 307-324: The length assertions on output.logprobs should be
skipped when a sequence was cancelled: change the checks around output.logprobs/
output.length to first test finish_reasons[src_idx] != CANCELLED (or equivalent
enum/constant) and only run the strict assert/warning logic if not cancelled;
for cancelled sequences allow partial logprobs without raising and keep the
existing warning/handling for disaggregated cases intact (update the branch that
currently contains the assert lines referencing finish_reasons, src_idx,
output.logprobs, and output.length).

In `@tensorrt_llm/serve/openai_protocol.py`:
- Around line 1090-1115: In _serialize_first_gen_log_probs and
_deserialize_first_gen_log_probs validate the nested structure at the protocol
boundary: ensure input is a list of lists/dicts with required keys ("token_id",
"logprob") and optional "rank", check types (token_id int, logprob number, rank
None or int), and raise ValueError with descriptive messages on mismatch rather
than letting AttributeError/KeyError propagate; in
_serialize_first_gen_log_probs verify each pos is a mapping and each lp has
attributes logprob and rank before serializing, and in
_deserialize_first_gen_log_probs verify each item is a dict containing
"token_id" and "logprob" (and that values are the right types) before
constructing Logprob, raising ValueError mentioning the offending position/item
when validation fails.

---

Nitpick comments:
In `@cum_log_probs_repro.py`:
- Around line 64-67: Replace the hardcoded absolute model path used as the
model= argument with a configurable value: read it from an environment variable
(e.g. os.environ.get('MODEL_PATH')) or add a CLI/config parameter and fall back
to a sensible default; update both places where model="/home/..." is passed (the
model= keyword in the object/constructor calls around the current occurrences)
so tests/CI/reviewers can override the path without editing the file.

In `@tensorrt_llm/disaggregated_params.py`:
- Around line 32-33: Define explicit nested type aliases and use them to
annotate first_gen_log_probs instead of Optional[List]; for example create
aliases like TokenLogprobs = Dict[int, float] (token id -> logprob),
BeamLogprobs = List[TokenLogprobs] (one list per beam), and FirstGenLogProbs =
Optional[List[BeamLogprobs]] (or Optional[BeamLogprobs] if outer list is
redundant), then replace the loose Optional[List] annotations for the
first_gen_log_probs field(s) with FirstGenLogProbs; update both occurrences of
first_gen_log_probs in this module so downstream code has an unambiguous
beam/token shape contract.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9667ea3 and c3ed32a.

📒 Files selected for processing (6)

cum_log_probs_repro.py
tensorrt_llm/_torch/pyexecutor/llm_request.py
tensorrt_llm/_torch/pyexecutor/py_executor.py
tensorrt_llm/disaggregated_params.py
tensorrt_llm/executor/result.py
tensorrt_llm/serve/openai_protocol.py

cum_log_probs_repro.py

tensorrt_llm/_torch/pyexecutor/py_executor.py

tensorrt_llm/executor/result.py

tensorrt_llm/serve/openai_protocol.py

schetlur-nv · 2026-02-25T23:03:15Z

@brb-nv do we have a test for this? If not, do we need to add one?

tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py

tensorrt_llm/disaggregated_params.py

brb-nv · 2026-02-26T02:44:40Z

/bot run --disable-fail-fast

brb-nv · 2026-02-26T04:33:05Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-26T04:39:53Z

PR_Github #36868 [ run ] triggered by Bot. Commit: 637cc1d Link to invocation

tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py

tensorrt-cicd · 2026-02-26T11:54:39Z

PR_Github #36868 [ run ] completed with state SUCCESS. Commit: 637cc1d
/LLM/main/L0_MergeRequest_PR pipeline #28546 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

cum_log_probs_repro.py

brb-nv · 2026-02-26T19:22:25Z

/bot run --disable-fail-fast

brb-nv · 2026-02-26T23:09:45Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-26T23:16:09Z

PR_Github #36972 [ run ] triggered by Bot. Commit: 82613cd Link to invocation

…de in disagg Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>

Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>

brb-nv · 2026-02-27T01:29:17Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-27T01:39:17Z

PR_Github #36993 [ run ] triggered by Bot. Commit: 2b5d0b3 Link to invocation

tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py

brb-nv · 2026-02-27T06:24:32Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-27T06:31:04Z

PR_Github #37034 [ run ] triggered by Bot. Commit: 2b5d0b3 Link to invocation

tensorrt-cicd · 2026-02-27T15:08:59Z

PR_Github #37034 [ run ] completed with state SUCCESS. Commit: 2b5d0b3
/LLM/main/L0_MergeRequest_PR pipeline #28675 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

brb-nv · 2026-02-27T16:27:54Z

/bot run --disable-fail-fast

pcastonguay · 2026-02-27T16:28:03Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-27T16:34:33Z

PR_Github #37091 [ run ] triggered by Bot. Commit: 2b5d0b3 Link to invocation

tensorrt-cicd · 2026-02-27T16:35:08Z

PR_Github #37092 [ run ] triggered by Bot. Commit: 2b5d0b3 Link to invocation

tensorrt-cicd · 2026-02-27T16:35:10Z

PR_Github #37091 [ run ] completed with state ABORTED. Commit: 2b5d0b3

Link to invocation

tensorrt-cicd · 2026-02-27T18:15:36Z

PR_Github #37092 [ run ] completed with state SUCCESS. Commit: 2b5d0b3
/LLM/main/L0_MergeRequest_PR pipeline #28718 completed with status: 'SUCCESS'

Link to invocation

…de in disagg (NVIDIA#11727) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>

…efill to decode in disagg (#11727) (#11792) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> Co-authored-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>

…de in disagg (NVIDIA#11727) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>

brb-nv requested review from a team as code owners February 25, 2026 22:45

brb-nv requested review from chuangz0, hchings and schetlur-nv February 25, 2026 22:45

brb-nv force-pushed the user/brb/cumulative-log-probs branch from c3ed32a to d2dd2f5 Compare February 25, 2026 22:54

coderabbitai bot reviewed Feb 25, 2026

View reviewed changes

brb-nv requested a review from a team as a code owner February 25, 2026 23:27

schetlur-nv requested review from Tabrizian, pcastonguay and peihu-nv February 25, 2026 23:38

Tabrizian reviewed Feb 26, 2026

View reviewed changes

tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py Outdated Show resolved Hide resolved

tensorrt_llm/disaggregated_params.py Show resolved Hide resolved

brb-nv force-pushed the user/brb/cumulative-log-probs branch 2 times, most recently from c263977 to 96b0815 Compare February 26, 2026 02:31

brb-nv force-pushed the user/brb/cumulative-log-probs branch from 96b0815 to 637cc1d Compare February 26, 2026 04:31

StanleySun639 requested changes Feb 26, 2026

View reviewed changes

tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py Show resolved Hide resolved

pcastonguay reviewed Feb 26, 2026

View reviewed changes

cum_log_probs_repro.py Outdated Show resolved Hide resolved

brb-nv requested a review from a team as a code owner February 26, 2026 19:14

brb-nv force-pushed the user/brb/cumulative-log-probs branch from ba3cacf to 3ee8d7d Compare February 26, 2026 19:17

brb-nv requested a review from StanleySun639 February 26, 2026 19:22

brb-nv requested a review from pcastonguay February 26, 2026 22:54

brb-nv force-pushed the user/brb/cumulative-log-probs branch from cd18351 to 82613cd Compare February 26, 2026 22:59

pcastonguay approved these changes Feb 26, 2026

View reviewed changes

brb-nv added 5 commits February 27, 2026 01:28

[https://nvbugs/5926823][fix] Propagate logprobs from prefill to deco…

b57276c

…de in disagg Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>

address comments from reviewers

d0e1ef0

Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>

remove repro script

e99afa6

Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>

address comments from Iman & Patrice

3951b80

Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>

bring back conditional on use_trtllm_sampler

2b5d0b3

Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>

brb-nv force-pushed the user/brb/cumulative-log-probs branch from 82613cd to 2b5d0b3 Compare February 27, 2026 01:29

brb-nv enabled auto-merge (squash) February 27, 2026 01:34

StanleySun639 approved these changes Feb 27, 2026

View reviewed changes

chuangz0 reviewed Feb 27, 2026

View reviewed changes

tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py Show resolved Hide resolved

chuangz0 approved these changes Feb 27, 2026

View reviewed changes

brb-nv merged commit 2220d48 into NVIDIA:main Feb 27, 2026
5 checks passed

pcastonguay pushed a commit to pcastonguay/TensorRT-LLM that referenced this pull request Feb 27, 2026

[https://nvbugs/5926823][fix] Propagate logprobs from prefill to deco…

47b8940

…de in disagg (NVIDIA#11727) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>

tianyuz-nv pushed a commit to wanqian-nv/TensorRT-LLM that referenced this pull request Mar 19, 2026

[https://nvbugs/5926823][fix] Propagate logprobs from prefill to deco…

d671fdc

…de in disagg (NVIDIA#11727) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>

Conversation

brb-nv commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

PR Checklist

GitHub Bot Help

kill

skip

reuse-pipeline

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

schetlur-nv commented Feb 25, 2026

Uh oh!

Uh oh!

Uh oh!

brb-nv commented Feb 26, 2026

Uh oh!

brb-nv commented Feb 26, 2026

Uh oh!

tensorrt-cicd commented Feb 26, 2026

Uh oh!

Uh oh!

tensorrt-cicd commented Feb 26, 2026

Uh oh!

Uh oh!

brb-nv commented Feb 26, 2026

Uh oh!

brb-nv commented Feb 26, 2026

Uh oh!

tensorrt-cicd commented Feb 26, 2026

Uh oh!

brb-nv commented Feb 27, 2026

Uh oh!

tensorrt-cicd commented Feb 27, 2026

Uh oh!

Uh oh!

brb-nv commented Feb 27, 2026

Uh oh!

tensorrt-cicd commented Feb 27, 2026

Uh oh!

tensorrt-cicd commented Feb 27, 2026

Uh oh!

brb-nv commented Feb 27, 2026

Uh oh!

pcastonguay commented Feb 27, 2026

Uh oh!

tensorrt-cicd commented Feb 27, 2026

Uh oh!

tensorrt-cicd commented Feb 27, 2026

Uh oh!

tensorrt-cicd commented Feb 27, 2026

Uh oh!

tensorrt-cicd commented Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

brb-nv commented Feb 25, 2026 •

edited

Loading

coderabbitai bot commented Feb 25, 2026 •

edited

Loading