Open
Description
System Info
Linux 20.04
Reproduction
output is defined as a tensor. The following works as expected:
output = matmul_4bit(a, b)
but the following does not, the elements in output are not changed.
matmul_4bit(a, b, out=output)
Expected behavior
matmul_4bit(a, b, out=output)
is expected to have the output values set. It is a preferred way as it saves extra memory copy.