Skip to content

[BugFix][Graph Optimization] Fix deterministic mode + CUDAGraph garbled output by making reset_share_inputs CUDAGraph-safe`#6746

Open
gongweibao wants to merge 2 commits intoPaddlePaddle:developfrom
gongweibao:errordecode
Open

[BugFix][Graph Optimization] Fix deterministic mode + CUDAGraph garbled output by making reset_share_inputs CUDAGraph-safe`#6746
gongweibao wants to merge 2 commits intoPaddlePaddle:developfrom
gongweibao:errordecode

Conversation

@gongweibao
Copy link
Collaborator

Motivation

When FD_DETERMINISTIC_MODE=1 and use_cudagraph=True are both enabled, model output is garbled. Either feature works correctly in isolation, but the combination fails.

Root cause: reset_share_inputs() is called after CUDAGraph capture in the deterministic mode post-warmup path (gpu_worker.py). Several lines inside reset_share_inputs() create new tensor objects (paddle.to_tensor(...) / paddle.full(...)), which allocate new GPU addresses. CUDAGraph has already baked the old addresses during capture, so replay reads/writes stale memory — producing garbage output.

Modifications

fastdeploy/worker/input_batch.py:

  • For 4 tensor fields (reasoning_allowed_tokens, free_list, free_list_len, rope_emb) that previously created new tensor objects in reset_share_inputs(), add FD_DETERMINISTIC_MODE guard to use in-place operations (fill_, copy_, slice assignment) instead, preserving GPU addresses for CUDAGraph safety.
  • Non-deterministic mode behavior is completely unchanged.

fastdeploy/worker/gpu_worker.py:

  • Remove the if not self.model_runner.use_cudagraph skip around reset_share_inputs(). Since reset_share_inputs() is now CUDAGraph-safe under deterministic mode, it can be called unconditionally.

Accuracy Tests

CUDA_VISIBLE_DEVICES=0,1,2,3 pytest tests/e2e/4cards_cases/test_determinism_long.py -v

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


gongweibao seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@paddle-bot
Copy link

paddle-bot bot commented Mar 9, 2026

Thanks for your contribution!

@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 5.55556% with 17 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@30f9f33). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/worker/input_batch.py 5.55% 17 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #6746   +/-   ##
==========================================
  Coverage           ?   71.95%           
==========================================
  Files              ?      392           
  Lines              ?    53848           
  Branches           ?     8463           
==========================================
  Hits               ?    38748           
  Misses             ?    12328           
  Partials           ?     2772           
Flag Coverage Δ
GPU 71.95% <5.55%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants