Skip to content

Information-Fusion-Lab-Umass/BottleneckIterativeNetwork

Repository files navigation

Audio-Visual Speech Separation via Bottleneck Iterative Network

Code repo for the 2025 ICMLWMLA workshop submission.

Webpage for the project

@misc{zhang2025audiovisualspeechseparationbottleneck,
      title={Audio-Visual Speech Separation via Bottleneck Iterative Network}, 
      author={Sidong Zhang and Shiv Shankar and Trang Nguyen and Andrea Fanelli and Madalina Fiterau},
      year={2025},
      eprint={2507.07270},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2507.07270}, 
}

Models

  • profusion_mbt or prombt are alternative/historical names of our proposed Bottleneck Iterative Network
  • avlit, iia (short for IIA-Net), rtfs (short for RTFS-Net) are benchmarks models studied in our work

Training

Evaluation

  • sbatch scripts follow the name of eval_<data>_<model>.sh, for example sbatch/eval_lrs3wham_prombt.sh
  • This isolated evaluations script generates the separated audio tracks for each test mixture audio track, instead of just give an overall performance evaluation as the one in the training process

About

Code repo for the 2025 ICMLWMLA workshop submission

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published