Open
Description
System Info
Linux
Reproduction
I hit this bug during my coding work. I verified it with the following script. With bitsandbytes 0.43.3, it shows that half of the elements from the quantize/dequantize from the in-contiguous tensor are different from the counterpart of contiguous tensor.
import torch
from bitsandbytes.functional import quantize_4bit, dequantize_4bit
# Create a contiguous tensor
original_tensor = torch.randn(3, 4, 6).to('cuda')
# Create a non-contiguous tensor by slicing (skipping elements)
non_contiguous_tensor = original_tensor[:, ::2, :]
compressed_non_contiguous_tensor, quant_state = quantize_4bit(non_contiguous_tensor)
uncompressed_non_contiguous_tensor = dequantize_4bit(compressed_non_contiguous_tensor, quant_state)
# Create a contiguous tensor by calling contiguous()
contiguous_tensor = non_contiguous_tensor.contiguous()
compressed_contiguous_tensor, quant_state = quantize_4bit(contiguous_tensor)
uncompressed_contiguous_tensor = dequantize_4bit(compressed_contiguous_tensor, quant_state)
# it is expected to be the same, but half of the elements are different
num_different_elements = (uncompressed_contiguous_tensor != uncompressed_non_contiguous_tensor).sum().item()
print(f"The number of different elements is: {num_different_elements}")
Expected behavior
Incontinguous tensors are expected to give the same quant/dequant results with the contiguous one, as long as the elements are the same.