fix: prevent panic in memory_cleanup on CUDA after linear forward #3932

amyssnippet · 2025-10-27T07:44:31Z

After a linear layer forward call in autodiff mode, executing AutodiffBackend::memory_cleanup on CUDA would panic with 'The size should match'. This occurred because memory_cleanup was called without synchronizing pending asynchronous operations, causing size mismatches in CubeCL's memory management.

The fix adds a sync() call before memory_cleanup in CubeBackend to ensure all operations are completed before cleanup. This resolves the issue for CUDA backends while having no impact on synchronous backends like ndarray.

Added a test in burn-cubecl to verify memory_cleanup works after linear operations.

Files changed:

crates/burn-cubecl/src/backend.rs: Added sync before memory_cleanup
crates/burn-cubecl/src/tests/memory_cleanup.rs: New test file
crates/burn-cubecl/src/tests/mod.rs: Added test module and macro

Pull Request Template

Checklist

Confirmed that cargo run-checks command has been executed.
Made sure the book is up to date with changes in this PR.

Related Issues/PRs

Provide links to relevant issues and dependent PRs.

Changes

Summarize the problem being addressed and your solution.

Testing

Describe how these changes have been tested.

issue fixed: #3927

After a linear layer forward call in autodiff mode, executing AutodiffBackend::memory_cleanup on CUDA would panic with 'The size should match'. This occurred because memory_cleanup was called without synchronizing pending asynchronous operations, causing size mismatches in CubeCL's memory management. The fix adds a sync() call before memory_cleanup in CubeBackend to ensure all operations are completed before cleanup. This resolves the issue for CUDA backends while having no impact on synchronous backends like ndarray. Added a test in burn-cubecl to verify memory_cleanup works after linear operations. Files changed: - crates/burn-cubecl/src/backend.rs: Added sync before memory_cleanup - crates/burn-cubecl/src/tests/memory_cleanup.rs: New test file - crates/burn-cubecl/src/tests/mod.rs: Added test module and macro

swfsql · 2025-10-28T00:16:14Z

Hi, I've tried running with your changes and it still panics for me. The part that panics is block Z, which has a linear layer. Your test refers to block X (although block X and Y are pretty much the same). Also, I think you accidentally removed testgen_jit_fusion when copying it to make testgen_memory_cleanup.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: prevent panic in memory_cleanup on CUDA after linear forward #3932

fix: prevent panic in memory_cleanup on CUDA after linear forward #3932

Uh oh!

amyssnippet commented Oct 27, 2025 •

edited

Loading

Uh oh!

swfsql commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: prevent panic in memory_cleanup on CUDA after linear forward #3932

Are you sure you want to change the base?

fix: prevent panic in memory_cleanup on CUDA after linear forward #3932

Uh oh!

Conversation

amyssnippet commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Template

Checklist

Related Issues/PRs

Changes

Testing

Uh oh!

swfsql commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amyssnippet commented Oct 27, 2025 •

edited

Loading