[TRTLLM-11228][feat] Update quickstart for DFlash by ziyixiong-nv · Pull Request #13545 · NVIDIA/TensorRT-LLM

ziyixiong-nv · 2026-04-28T05:22:47Z

Summary by CodeRabbit

New Features
- Introduced DFlash speculative decoding algorithm, which leverages target model hidden states as cross-attention context to enable efficient parallel draft token prediction for improved inference performance.
Documentation
- Enhanced documentation with DFlash configuration guides, setup examples, and integration instructions for multiple deployment and workflow scenarios.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

ziyixiong-nv · 2026-04-28T05:23:18Z

/bot run

coderabbitai · 2026-04-28T05:26:40Z

📝 Walkthrough

Walkthrough

The changes introduce support for DFlash speculative decoding algorithm through documentation and example code. Documentation describes how DFlash uses target model hidden states as cross-attention context in the draft model. Example code adds configuration support for initializing DFlashDecodingConfig with appropriate parameters.

Changes

Cohort / File(s)	Summary
DFlash Speculative Decoding Support `docs/source/features/speculative-decoding.md`, `examples/llm-api/quickstart_advanced.py`	Added documentation describing DFlash algorithm with target layer IDs and cross-attention mechanism, plus example code to initialize `DFlashDecodingConfig` when `spec_decode_algo` is set to `"DFLASH"`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is entirely the template boilerplate with no actual content filled in; all key sections (Description, Test Coverage, Checklist) remain empty or unchecked.	Fill in the Description section explaining what DFlash is and why the quickstart was updated. Add Test Coverage details and complete the PR Checklist items as appropriate.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly describes the main change: adding DFlash support to the quickstart example with a valid JIRA ticket and feature type.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

examples/llm-api/quickstart_advanced.py (1)
1-1: ⚠️ Potential issue | 🟠 Major

Add the required NVIDIA copyright header to this modified Python file.

This file is modified but currently has no header at the top.

As per coding guidelines **/*.{h,hpp,cpp,cc,cxx,cu,py}: “All TensorRT-LLM source files must contain an NVIDIA copyright header with the year of latest meaningful modification.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm-api/quickstart_advanced.py` at line 1, Add the required NVIDIA
copyright header at the very top of the modified Python file: insert the
standard NVIDIA copyright block including the year of latest meaningful
modification and the canonical NVIDIA header text required by project
guidelines; ensure it precedes any imports (e.g., before the existing "import
argparse") and uses the same formatting as other project source headers so
linters and license checks will recognize it.
docs/source/features/speculative-decoding.md (1)
223-233: ⚠️ Potential issue | 🟡 Minor

Update backend-support note to avoid contradiction with the new DFlash option.

Line 230 adds DFlash as an available decoding_type, but Line 233 still says PyTorch supports only Eagle3. Please align this note with the newly documented option.
✏️ Suggested doc fix
-> Note: The PyTorch backend supports only `Eagle3`. `decoding_type: Eagle` is accepted as a backward-compatible alias for `Eagle3`, but EAGLE (v1/v2) draft checkpoints are incompatible.
+> Note: The PyTorch backend supports a subset of decoding types, including `Eagle3` and `DFlash`. `decoding_type: Eagle` is accepted as a backward-compatible alias for `Eagle3`, but EAGLE (v1/v2) draft checkpoints are incompatible.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/source/features/speculative-decoding.md` around lines 223 - 233, The
backend-support note is now contradictory because `decoding_type: DFlash` was
added but the note still claims the PyTorch backend supports only `Eagle3`;
update the note to reflect actual PyTorch support by listing which decoding
types PyTorch supports (e.g., `Eagle3` and `DFlash` if supported) and keep the
backward-compatible alias (`decoding_type: Eagle`) remark; edit the sentence
mentioning PyTorch to explicitly enumerate supported decoding types (referencing
`decoding_type`, `Eagle3`, `DFlash`, and the `Eagle` alias) so the doc is
consistent.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@docs/source/features/speculative-decoding.md`:
- Around line 223-233: The backend-support note is now contradictory because
`decoding_type: DFlash` was added but the note still claims the PyTorch backend
supports only `Eagle3`; update the note to reflect actual PyTorch support by
listing which decoding types PyTorch supports (e.g., `Eagle3` and `DFlash` if
supported) and keep the backward-compatible alias (`decoding_type: Eagle`)
remark; edit the sentence mentioning PyTorch to explicitly enumerate supported
decoding types (referencing `decoding_type`, `Eagle3`, `DFlash`, and the `Eagle`
alias) so the doc is consistent.

In `@examples/llm-api/quickstart_advanced.py`:
- Line 1: Add the required NVIDIA copyright header at the very top of the
modified Python file: insert the standard NVIDIA copyright block including the
year of latest meaningful modification and the canonical NVIDIA header text
required by project guidelines; ensure it precedes any imports (e.g., before the
existing "import argparse") and uses the same formatting as other project source
headers so linters and license checks will recognize it.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0c4d71e6-ff4d-42dd-91b3-b942e4d97076

📥 Commits

Reviewing files that changed from the base of the PR and between 0b9dfdc and 83ccef0.

📒 Files selected for processing (2)

docs/source/features/speculative-decoding.md
examples/llm-api/quickstart_advanced.py

tensorrt-cicd · 2026-04-28T05:29:15Z

PR_Github #45864 [ run ] triggered by Bot. Commit: 83ccef0 Link to invocation

tensorrt-cicd · 2026-04-28T11:06:39Z

PR_Github #45864 [ run ] completed with state FAILURE. Commit: 83ccef0
/LLM/main/L0_MergeRequest_PR pipeline #36041 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

ziyixiong-nv · 2026-04-29T01:13:12Z

/bot run

tensorrt-cicd · 2026-04-29T01:19:41Z

PR_Github #46011 [ run ] triggered by Bot. Commit: 797531f Link to invocation

tensorrt-cicd · 2026-04-29T02:56:19Z

PR_Github #46011 [ run ] completed with state FAILURE. Commit: 797531f
/LLM/main/L0_MergeRequest_PR pipeline #36161 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

ziyixiong-nv · 2026-04-29T07:04:35Z

/bot run

tensorrt-cicd · 2026-04-29T07:12:51Z

PR_Github #46089 [ run ] triggered by Bot. Commit: cd9d65e Link to invocation

tensorrt-cicd · 2026-04-29T11:58:10Z

PR_Github #46089 [ run ] completed with state FAILURE. Commit: cd9d65e
/LLM/main/L0_MergeRequest_PR pipeline #36231 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>

ziyixiong-nv · 2026-04-29T15:31:16Z

/bot run

tensorrt-cicd · 2026-04-29T15:37:32Z

PR_Github #46161 [ run ] triggered by Bot. Commit: d452a12 Link to invocation

tensorrt-cicd · 2026-04-29T21:06:05Z

PR_Github #46161 [ run ] completed with state SUCCESS. Commit: d452a12
/LLM/main/L0_MergeRequest_PR pipeline #36284 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

ziyixiong-nv · 2026-04-30T00:06:07Z

/bot run

tensorrt-cicd · 2026-04-30T00:13:33Z

PR_Github #46231 [ run ] triggered by Bot. Commit: d452a12 Link to invocation

tensorrt-cicd · 2026-04-30T06:46:00Z

PR_Github #46231 [ run ] completed with state SUCCESS. Commit: d452a12
/LLM/main/L0_MergeRequest_PR pipeline #36343 completed with status: 'SUCCESS'

CI Report

Link to invocation

ziyixiong-nv requested a review from a team as a code owner April 28, 2026 05:22

ziyixiong-nv requested review from arysef and nv-guomingz April 28, 2026 05:22

github-actions Bot assigned ziyixiong-nv Apr 28, 2026

coderabbitai Bot reviewed Apr 28, 2026

View reviewed changes

ziyixiong-nv force-pushed the dev-fxiong-dflash-task branch from 83ccef0 to 797531f Compare April 29, 2026 01:12

ziyixiong-nv force-pushed the dev-fxiong-dflash-task branch from 797531f to cd9d65e Compare April 29, 2026 07:04

[TRTLLM-11228][feat] Update quickstart for DFlash

d452a12

Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>

ziyixiong-nv force-pushed the dev-fxiong-dflash-task branch from cd9d65e to d452a12 Compare April 29, 2026 15:31

Conversation

ziyixiong-nv commented Apr 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

ziyixiong-nv commented Apr 28, 2026

Uh oh!

coderabbitai Bot commented Apr 28, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Apr 28, 2026

Uh oh!

tensorrt-cicd commented Apr 28, 2026

Uh oh!

ziyixiong-nv commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

ziyixiong-nv commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

ziyixiong-nv commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

ziyixiong-nv commented Apr 30, 2026

Uh oh!

tensorrt-cicd commented Apr 30, 2026

Uh oh!

tensorrt-cicd commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ziyixiong-nv commented Apr 28, 2026 •

edited by coderabbitai Bot

Loading