Anybody observe slow training speed for mamba compared to transformer model? #49
Replies: 3 comments
-
|
yes, I just made the comparison. The speed of the Mamba block is truly slower than Transformer, under the same input dimension |
Beta Was this translation helpful? Give feedback.
-
|
for both model training and inference. I am not looking into the details of the Mamba block, maybe I missed something in the code.... |
Beta Was this translation helpful? Give feedback.
-
|
I've merged this issue into the “Discussion” section. On my machine, with parameter expansion mamba has an advantage over Transformer. I'm not sure if this is environment/GPU related. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Beta Was this translation helpful? Give feedback.
All reactions