Skip to content

Fix tensor lifetime issue#4228

Open
SandSnip3r wants to merge 1 commit intopytorch:mainfrom
SandSnip3r:fix-runtime-buffer-lifetime
Open

Fix tensor lifetime issue#4228
SandSnip3r wants to merge 1 commit intopytorch:mainfrom
SandSnip3r:fix-runtime-buffer-lifetime

Conversation

@SandSnip3r
Copy link
Copy Markdown
Contributor

@SandSnip3r SandSnip3r commented May 1, 2026

Description

This change fixes a correctness issue that I and others were seeing when running the FLUX2 diffusion model. The model, when compiled with either TensorRT or TensorRT-RTX was producing garbage images.

The issue was that the input tensor's lifetime was incorrect. The input tensor's ref count dropped to 0 before the engine ran with enqueueV3(). In this specific case, it was a bit of a perfect storm with an output having the same size and shape and also there being a fp32->bf16 cast. Another tensor was being allocated (the output tensor) and that was given the address of the input tensor.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • My code follows the style guidelines of this project (You can use the linters)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas and hacks
  • I have made corresponding changes to the documentation
  • I have added tests to verify my fix or my feature
  • New and existing unit tests pass locally with my changes
  • I have added the relevant labels to my PR in so that relevant reviewers are notified

@meta-cla meta-cla Bot added the cla signed label May 1, 2026
@github-actions github-actions Bot added component: tests Issues re: Tests component: core Issues re: The core compiler component: runtime labels May 1, 2026
@github-actions github-actions Bot requested a review from narendasan May 1, 2026 00:20

auto dims = core::util::toVec(out_shape);
auto type = util::TRTDataTypeToScalarType(compiled_engine->exec_ctx->getEngine().getTensorDataType(name.c_str()));
outputs[pyt_idx] = std::move(at::empty(dims, {at::kCUDA}).to(type).contiguous());
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, a separate cleanup should be done where this line is instead

outputs[pyt_idx] = at::empty(dims, at::TensorOptions().device(at::kCUDA).dtype(type));

This would improve from two allocations & a dtype-conversion kernel to just a single allocation.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the same line Shane identified as well.

Copy link
Copy Markdown
Collaborator

@narendasan narendasan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me


auto dims = core::util::toVec(out_shape);
auto type = util::TRTDataTypeToScalarType(compiled_engine->exec_ctx->getEngine().getTensorDataType(name.c_str()));
outputs[pyt_idx] = std::move(at::empty(dims, {at::kCUDA}).to(type).contiguous());
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the same line Shane identified as well.

@SandSnip3r SandSnip3r force-pushed the fix-runtime-buffer-lifetime branch from 3b6cdb3 to 66c5a42 Compare May 4, 2026 18:39
@SandSnip3r SandSnip3r force-pushed the fix-runtime-buffer-lifetime branch from 66c5a42 to 61b3003 Compare May 5, 2026 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants