Audio-Visual Speech Separation via Bottleneck Iterative Network

Code repo for the 2025 ICMLWMLA workshop submission.

@misc{zhang2025audiovisualspeechseparationbottleneck,
      title={Audio-Visual Speech Separation via Bottleneck Iterative Network}, 
      author={Sidong Zhang and Shiv Shankar and Trang Nguyen and Andrea Fanelli and Madalina Fiterau},
      year={2025},
      eprint={2507.07270},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2507.07270}, 
}

Models

profusion_mbt or prombt are alternative/historical names of our proposed Bottleneck Iterative Network
avlit, iia (short for IIA-Net), rtfs (short for RTFS-Net) are benchmarks models studied in our work

Training

sbatch scripts in sbatch folder follow the name of <data>_<model>_<gpu-type>.sh, so sbatch/lrs3wham_profusion_mbt_a100.sh trains the proposed BIN model on LRS3WHAM data using A100 GPU via this python script: scripts/run_lrs3wham_prombt.py
BIN implementation can be found here:

BottleneckIterativeNetwork/models/progressive_mbt.py

Line 94 in fc94ca0

class ProgressiveBranch(nn.Module):
Training main pipeline follow this abstract class:

BottleneckIterativeNetwork/engines/double_separation.py

Line 47 in 260ff81

class AbsDoubleSeparation(abs_sep.DistributedSeparation, ABC):
Training pipeline for BIN on LRS3 is here:

BottleneckIterativeNetwork/engines/double_separation.py

Line 621 in 260ff81

class LRS3ProfusionBottleneck(LRS3Profusion):
Training pipeline for BIN on NTCD-TIMIT is here:

BottleneckIterativeNetwork/engines/double_separation.py

Line 1103 in 260ff81

class NTCDPrombt(AbsDoubleSeparation):

Evaluation

sbatch scripts follow the name of eval_<data>_<model>.sh, for example sbatch/eval_lrs3wham_prombt.sh
This isolated evaluations script generates the separated audio tracks for each test mixture audio track, instead of just give an overall performance evaluation as the one in the training process

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
config		config
dataset		dataset
engines		engines
models		models
sample_audio/WEMUST+THEYRELIKE		sample_audio/WEMUST+THEYRELIKE
sbatch		sbatch
scripts		scripts
utils		utils
.DS_Store		.DS_Store
README.md		README.md
definition.py		definition.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Audio-Visual Speech Separation via Bottleneck Iterative Network

Models

Training

Evaluation

About

Uh oh!

Releases

Packages

Languages

Information-Fusion-Lab-Umass/BottleneckIterativeNetwork

Folders and files

Latest commit

History

Repository files navigation

Audio-Visual Speech Separation via Bottleneck Iterative Network

Models

Training

Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages