Skip to content
Change the repository type filter

All

    Repositories list

    • moshi

      Public
      Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
      Python
      Apache License 2.0
      95610k6816Updated May 16, 2026May 16, 2026
    • A TTS that fits in your CPU (and pocket)
      Python
      MIT License
      5094.5k427Updated May 5, 2026May 5, 2026
    • tts_longeval

      Public
      Python
      MIT License
      13000Updated Apr 29, 2026Apr 29, 2026
    • moshi-rag

      Public
      MoshiRAG is a compact full-duplex speech language model augmented with asynchronous knowledge retrieval to improve factuality without sacrificing real-time inte…
      Rust
      Apache License 2.0
      68700Updated Apr 28, 2026Apr 28, 2026
    • flashy

      Public
      Framework for writing deep learning training loops. Lightweight, and retaining full freedom to design as you see fits. It handles checkpointing, logging, distri…
      Python
      MIT License
      0600Updated Apr 24, 2026Apr 24, 2026
    • To bring back voice to those who lost it
      TypeScript
      MIT License
      79161Updated Apr 20, 2026Apr 20, 2026
    • ovie

      Public
      Official implementation and models for OVIE (One View Is Enough! Monocular Training for In-the-Wild Novel View Generation)
      Jupyter Notebook
      36600Updated Apr 16, 2026Apr 16, 2026
    • dactory

      Public
      Python
      Apache License 2.0
      55200Updated Apr 2, 2026Apr 2, 2026
    • unmute

      Public
      Make text LLMs listen and speak
      Python
      MIT License
      2271.3k261Updated Mar 26, 2026Mar 26, 2026
    • JAX bindings for the FlashAttention 3 kernels
      C++
      BSD 3-Clause "New" or "Revised" License
      12300Updated Mar 9, 2026Mar 9, 2026
    • casa

      Public
      A vision-language model with an improved cross-attention mechanism for scalable streaming inference
      Python
      MIT License
      32930Updated Mar 9, 2026Mar 9, 2026
    • yomikomi

      Public
      A small rust-based data loader
      Rust
      Apache License 2.0
      23711Updated Feb 20, 2026Feb 20, 2026
    • A real-time and multilingual speech translation model
      Python
      MIT License
      2424820Updated Feb 13, 2026Feb 13, 2026
    • Kyutai's Speech-To-Text and Text-To-Speech models based on the Delayed Streams Modeling framework.
      Python
      Apache License 2.0
      3062.9k350Updated Jan 26, 2026Jan 26, 2026
    • dora

      Public
      Dora is an experiment management framework. It expresses grid searches as pure python files as part of your repo. It identifies experiments with a unique hash s…
      Python
      MIT License
      0500Updated Jan 22, 2026Jan 22, 2026
    • sphn

      Public
      python bindings for symphonia/opus - read various audio formats from python and write opus files
      Rust
      Apache License 2.0
      97910Updated Jan 7, 2026Jan 7, 2026
    • Python
      Apache License 2.0
      32800Updated Jan 5, 2026Jan 5, 2026
    • JAX bindings for the flash-attention3 kernels
      C++
      32200Updated Jan 2, 2026Jan 2, 2026
    • Animations for the blog "Neural audio codecs: how to get audio into LLMs"
      TypeScript
      0400Updated Oct 20, 2025Oct 20, 2025
    • Code for the blog "Neural audio codecs: how to get audio into LLMs"
      Python
      MIT License
      10k16700Updated Oct 20, 2025Oct 20, 2025
    • Python
      Apache License 2.0
      64451141Updated Oct 3, 2025Oct 3, 2025
    • Swift
      MIT License
      1613720Updated Jun 26, 2025Jun 26, 2025
    • hibiki

      Public
      Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits for the end of the sourc…
      Rust
      Apache License 2.0
      1171.5k91Updated Apr 15, 2025Apr 15, 2025
    • moshivis

      Public
      Kyutai with an "eye"
      Python
      Apache License 2.0
      3224810Updated Mar 26, 2025Mar 26, 2025
    • kaudio

      Public
      Rust crate for some audio utilities
      Rust
      Apache License 2.0
      02800Updated Mar 8, 2025Mar 8, 2025
    • Proof of concept for running moshi/hibiki using webrtc
      Rust
      Apache License 2.0
      22100Updated Feb 28, 2025Feb 28, 2025
    • JAX bindings for the flash-attention2 kernels
      C++
      01200Updated Jan 16, 2025Jan 16, 2025
    • ogg-table

      Public
      Ogg-vorbis reader with fast random access
      Rust
      Other
      1800Updated Aug 29, 2024Aug 29, 2024
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.