Anybody observe slow training speed for mamba compared to transformer model? #49

vgthengane · 2024-05-12T19:55:01Z

vgthengane
May 12, 2024

d62lu · 2024-06-25T20:33:19Z

yes, I just made the comparison. The speed of the Mamba block is truly slower than Transformer, under the same input dimension

0 replies

d62lu · 2024-06-25T20:34:36Z

for both model training and inference. I am not looking into the details of the Mamba block, maybe I missed something in the code....

0 replies

LMD0311 · 2025-03-19T03:58:21Z

I've merged this issue into the “Discussion” section.

On my machine, with parameter expansion mamba has an advantage over Transformer. I'm not sure if this is environment/GPU related.

0 replies