Skip to content

Do not let chunked prefill generate decode tokens#3777

Open
tdene wants to merge 2 commits intoNVIDIA:mainfrom
tdene:tde/fix_chunked_prefill_bug
Open

Do not let chunked prefill generate decode tokens#3777
tdene wants to merge 2 commits intoNVIDIA:mainfrom
tdene:tde/fix_chunked_prefill_bug

Conversation

@tdene
Copy link
Contributor

@tdene tdene commented Mar 10, 2026

What does this PR do ?

⚠️ For major changes (either in lines of code or in its impact), please make sure to first share a design doc with the team. If you're unsure what's the best way to do so, contact the @mcore-oncall.

Contribution process

Pre-checks

  • I have added relevant unit tests
  • I have added relevant functional tests
  • I have added proper typing to my code Typing guidelines
  • I have added relevant documentation
  • I have run the autoformatter.sh on my PR

Code review

Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!

All PRs start as draft. If you open a non-draft PR, it will be automatically converted to draft.

Step 1: Mark PR as "Ready for Review"

  1. When your PR is ready, click Ready for Review.
  2. An oncall reviewer is auto-assigned and expert reviewers are notified based on your changes.
    • Some PRs may jump straight to step 2. This is determined by .github/CODEOWNERS.

⚠️ Only mark as ready once merge-conflicts are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

Step 2: Final Review

For PRs that change megatron/core, once all expert reviewers have approved, the Final Review label is applied automatically and final reviewers are assigned.

For PRs outside megatron/core, this step is skipped.

Step 3: Approved

Once all required reviewers have approved, the Approved label is applied automatically.

Merge

Any member of mcore-engineers will be able to merge your PR.

For MRs into `dev` branch The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either eharper@nvidia.com or zijiey@nvidia.com.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 10, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@tdene tdene marked this pull request as ready for review March 10, 2026 17:25
@tdene tdene requested review from a team as code owners March 10, 2026 17:25
@svcnvidia-nemo-ci svcnvidia-nemo-ci requested a review from a team March 10, 2026 17:26
@svcnvidia-nemo-ci svcnvidia-nemo-ci added this to the Core 0.16 milestone Mar 10, 2026
Copy link
Contributor

@santhnm2 santhnm2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a unit test for this?

@tdene tdene added the Expert Review [deprecated] Apply this label to indicate that your PR is ready for expert review. label Mar 10, 2026
request.prompt_log_probs = []
request.generated_log_probs = []
is_chunked_prefill = request_id == self.context.chunked_prefill_request_id
is_prefill = len(request_log_probs) > 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we have scheduling code to prevent prefill requests from having a single token, but would be good to double check that there's no edge case here

@svcnvidia-nemo-ci svcnvidia-nemo-ci added the Final Review PR is in the "final review" stage label Mar 10, 2026
@Phlip79 Phlip79 removed Expert Review [deprecated] Apply this label to indicate that your PR is ready for expert review. labels Mar 10, 2026
@Phlip79
Copy link
Member

Phlip79 commented Mar 10, 2026

@tdene FYI we don't use the "Expert Review" label anymore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

complexity: low Final Review PR is in the "final review" stage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants