(Lowest of the low priorities)
SSMs have been making the rounds but people have only cared about them for 'major' tasks. (NMT models, speech, LLM). Since they're special LSTMs and we see better performance from that type of model on our type of tasks, may be fun to implement an SSM decoder and try out.
More than theoretical interest, they're supposed to be more memory efficient than transformers, so we can probably run some wicked batch sizes if they're implemented well.
(Lowest of the low priorities)
SSMs have been making the rounds but people have only cared about them for 'major' tasks. (NMT models, speech, LLM). Since they're special LSTMs and we see better performance from that type of model on our type of tasks, may be fun to implement an SSM decoder and try out.
More than theoretical interest, they're supposed to be more memory efficient than transformers, so we can probably run some wicked batch sizes if they're implemented well.