Skip to content

Conversation

@amyssnippet
Copy link

@amyssnippet amyssnippet commented Oct 27, 2025

After a linear layer forward call in autodiff mode, executing AutodiffBackend::memory_cleanup on CUDA would panic with 'The size should match'. This occurred because memory_cleanup was called without synchronizing pending asynchronous operations, causing size mismatches in CubeCL's memory management.

The fix adds a sync() call before memory_cleanup in CubeBackend to ensure all operations are completed before cleanup. This resolves the issue for CUDA backends while having no impact on synchronous backends like ndarray.

Added a test in burn-cubecl to verify memory_cleanup works after linear operations.

Files changed:

  • crates/burn-cubecl/src/backend.rs: Added sync before memory_cleanup
  • crates/burn-cubecl/src/tests/memory_cleanup.rs: New test file
  • crates/burn-cubecl/src/tests/mod.rs: Added test module and macro

Pull Request Template

Checklist

  • Confirmed that cargo run-checks command has been executed.
  • Made sure the book is up to date with changes in this PR.

Related Issues/PRs

Provide links to relevant issues and dependent PRs.

Changes

Summarize the problem being addressed and your solution.

Testing

Describe how these changes have been tested.

issue fixed: #3927

After a linear layer forward call in autodiff mode, executing
AutodiffBackend::memory_cleanup on CUDA would panic with 'The size should match'.
This occurred because memory_cleanup was called without synchronizing pending
asynchronous operations, causing size mismatches in CubeCL's memory management.

The fix adds a sync() call before memory_cleanup in CubeBackend to ensure all
operations are completed before cleanup. This resolves the issue for CUDA
backends while having no impact on synchronous backends like ndarray.

Added a test in burn-cubecl to verify memory_cleanup works after linear operations.

Files changed:
- crates/burn-cubecl/src/backend.rs: Added sync before memory_cleanup
- crates/burn-cubecl/src/tests/memory_cleanup.rs: New test file
- crates/burn-cubecl/src/tests/mod.rs: Added test module and macro
@swfsql
Copy link
Contributor

swfsql commented Oct 28, 2025

Hi, I've tried running with your changes and it still panics for me. The part that panics is block Z, which has a linear layer. Your test refers to block X (although block X and Y are pretty much the same). Also, I think you accidentally removed testgen_jit_fusion when copying it to make testgen_memory_cleanup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

(cuda) panic on memory_cleanup(): "The size should match"

2 participants