Hi! Great toolkit for efficient SpeechLM training.
Would you consider supporting FunASR models (Paraformer, SenseVoice) in SlamKit?
Relevant models
- Fun-ASR-Nano — Audio encoder + Qwen2.5-0.5B LLM, end-to-end speech LM
- SenseVoice — Multi-task speech model (ASR + emotion + events), non-autoregressive
- Paraformer — Non-autoregressive ASR with CTC/attention hybrid
Why integrate?
- Pre-trained audio encoders from FunASR could serve as speech encoders for SpeechLM training
- SenseVoice's multi-task architecture is relevant for multi-task SpeechLM research
- Fun-ASR-Nano demonstrates encoder-LLM fusion for speech understanding
References
Hi! Great toolkit for efficient SpeechLM training.
Would you consider supporting FunASR models (Paraformer, SenseVoice) in SlamKit?
Relevant models
Why integrate?
References