Skip to content

(cuda) panic on memory_cleanup(): "The size should match" #3927

@swfsql

Description

@swfsql

Describe the bug
After a linear layer forward call, executing a AutodiffBackend::memory_cleanup on cuda panics with "The size should match".

To Reproduce
Please see this gist.

Expected behavior
All pass.

Screenshots
(Not applicable)

Desktop (please complete the following information):

  • OS: Linux 6.x, cuda 12.x.

Smartphone (please complete the following information):
(Not applicable)

Additional context

  • Three blocks, XYZ, are executed.
    • X: A tensor with a require_grad test;
    • Y: A Param tensor test;
    • Z: A Linear model test. I didn't explore much, but I didn't find a simpler test that fails for Z.
  • They all pass for ndarray, but block Z fails for cuda.
    • Each block probably doesn't interfere with one another. On cuda, when executing XYYZ, or XZY, it always only fails for block Z.
  • The gist contains the error message.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions