Skip to content

Add FunASR/SenseVoice models as supported ASR in SlamKit #22

@LauraGPT

Description

@LauraGPT

Hi! Great toolkit for efficient SpeechLM training.

Would you consider supporting FunASR models (Paraformer, SenseVoice) in SlamKit?

Relevant models

  • Fun-ASR-Nano — Audio encoder + Qwen2.5-0.5B LLM, end-to-end speech LM
  • SenseVoice — Multi-task speech model (ASR + emotion + events), non-autoregressive
  • Paraformer — Non-autoregressive ASR with CTC/attention hybrid

Why integrate?

  • Pre-trained audio encoders from FunASR could serve as speech encoders for SpeechLM training
  • SenseVoice's multi-task architecture is relevant for multi-task SpeechLM research
  • Fun-ASR-Nano demonstrates encoder-LLM fusion for speech understanding

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions