Local Commentator Muter Prototype

This is a local Python prototype for detecting known voices, such as NFL commentators, and muting them in near real time.

It is a prototype, not a polished product. The focus is:

enroll target voices from audio clips
test detection accuracy against recorded clips
listen to live audio from an input device
compare short speech windows to enrolled voice profiles
mute output when a blocked speaker is active

How it works

The pipeline is:

Capture audio from a local input device
Run voice activity detection to skip silence and low-speech chunks
Generate a speaker embedding for each speech chunk
Compare that embedding to saved speaker profiles
Apply smoothing rules before muting or unmuting

The speaker recognition model uses SpeechBrain's pretrained ECAPA embedding model.

Recommended environment

Use Python 3.11 or 3.12.

Many audio ML packages are not yet reliable on Python 3.14, so using a virtual environment on 3.11/3.12 will save you time.

Setup

Install Python 3.11 or 3.12
Create a virtual environment
Install dependencies

python3.11 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

If pyaudio fails to install on macOS, install PortAudio first:

brew install portaudio

Enroll voices

Create one WAV file per commentator you want to detect. Cleaner speech gives better results.

Suggested training data per speaker:

at least 5-10 minutes of mostly clean speech
multiple clips from different games
minimal crowd noise when possible

Then enroll:

python -m commentator_muter.enroll \
  --name joe_buck \
  --audio samples/joe_buck_1.wav samples/joe_buck_2.wav

This creates a profile in profiles/joe_buck.json.

Capture training clips from BlackHole

If you want to build a better dataset directly from routed game audio, you can record labeled clips from a live input such as BlackHole 2ch.

List devices first:

python -m commentator_muter.capture_training_audio --list-devices

Then record clips for one label:

python -m commentator_muter.capture_training_audio \
  --input-device 1 \
  --label chris

The recorder will:

keep listening on the selected input device
start a take when you press Enter
stop the take when you press Enter again
show clip duration, RMS, and RNNoise speech probability
let you save, discard, or redo the take

By default, clips are written to:

samples/captured/<label>/

You can point it somewhere else if you want to record directly into a training folder:

python -m commentator_muter.capture_training_audio \
  --input-device 1 \
  --label mike \
  --output-dir samples

Bootstrap MUSAN and VoxCeleb

You can also pull in a starter subset of Hugging Face datasets for noise negatives and speaker metadata.

This command downloads:

a starter subset of MUSAN noise wav files
VoxCeleb metadata files
optionally the smaller vox1_test_wav.zip archive if you want some VoxCeleb audio right away

python -m commentator_muter.bootstrap_hf_datasets

By default it writes to:

external_datasets/

If you want a larger MUSAN starter pack:

python -m commentator_muter.bootstrap_hf_datasets --musan-noise-count 250

If you want the optional VoxCeleb test audio archive too:

python -m commentator_muter.bootstrap_hf_datasets --include-vox1-test-audio

Notes:

MUSAN is easy to use as noise/background negatives.
The full VoxCeleb archives are very large, so this bootstrap command stays conservative by default.

Run live detection and mute

List devices first if needed:

python -m commentator_muter.run --list-devices

Then start the prototype:

python -m commentator_muter.run \
  --input-device 0 \
  --output-device 1 \
  --block joe_buck \
  --threshold 0.72

Test detection with recorded voice data

Before wiring this into another app, you can test the speaker detector by running it on saved clips.

Example:

python -m commentator_muter.detect \
  --audio tests/joe_clip.wav tests/troy_clip.wav \
  --threshold 0.72

For each speech window, the tool prints:

the timestamp inside the clip
the best speaker match
the best score
the top profile scores for comparison

Then it prints a clip-level summary showing:

how many windows cleared the threshold
which speaker dominated the clip
how often each speaker won

If you want just summaries:

python -m commentator_muter.detect \
  --audio tests/*.wav \
  --summary-only

If you want machine-readable output:

python -m commentator_muter.detect \
  --audio tests/*.wav \
  --json-output

Test only "Chris or not"

If you only care about one commentator right now, enroll just that voice and run the binary detector.

Example:

python -m commentator_muter.enroll \
  --name chris \
  --audio samples/chris_1.wav samples/chris_2.wav

python -m commentator_muter.detect_binary \
  --target chris \
  --audio tests/chris_clip.wav tests/other_clip.wav \
  --threshold 0.72

This prints a simple result for each window:

CHRIS
NOT_CHRIS
similarity score

Then it prints a clip-level summary with:

final decision
max score
average score
accepted windows over total speech windows

Live terminal feedback

You can also run the detector live and have it print whether the active voice sounds like Chris.

List devices:

python -m commentator_muter.live_detect --list-devices

Then monitor a microphone or routed audio input:

python -m commentator_muter.live_detect \
  --target chris \
  --input-device 0

It will print one of:

CHRIS
CHRIS_HOLD
NOT_CHRIS
NO_SPEECH

Delayed Live Playback

If you want the audio output to line up with the detection timing, use the delayed live command.

It:

reads live audio from an input device
buffers it for a configurable delay
plays the delayed audio to an output device
runs detection against that same delayed timeline

Example:

python -m commentator_muter.live_delay \
  --target chris \
  --input-device 1 \
  --output-device 3 \
  --delay-seconds 1.5

This is the first step toward a synced “delay, detect, then act” architecture for muting.

The live detector now uses a sticky threshold:

Chris must cross the normal threshold to enter CHRIS
for a short time after a strong match, the detector uses a slightly lower threshold
that reduces flicker during crowd noise or weaker speech windows
it also keeps a short decision window so overlapping or noisy chunks are judged with recent context instead of one chunk at a time
before scoring, it now runs stronger speech enhancement and a target-focused extraction step that keeps the subsegments most likely to sound like Chris

You can tune it with:

python -m commentator_muter.live_detect \
  --target chris \
  --input-device 0 \
  --extraction-subwindow-seconds 0.35 \
  --extraction-subwindow-stride-seconds 0.175 \
  --extraction-keep-ratio 0.5 \
  --extraction-min-score 0.28 \
  --decision-window-seconds 1.6 \
  --min-positive-windows 2 \
  --strong-positive-score 0.58 \
  --sticky-hold-seconds 2.0 \
  --sticky-threshold-drop 0.05 \
  --negative-chunks-to-release 3

Notes

Start with one blocked speaker first
Use volume ducking before full mute if the hard cut feels too abrupt
Expect some latency, usually around 1-2s in prototype form with the delayed decision window
Overlapping speech and loud crowd noise are the hardest cases

Files

commentator_muter/enroll.py: build speaker profiles
commentator_muter/detect.py: test speaker detection against recorded audio
commentator_muter/detect_binary.py: test one speaker as yes-or-no
commentator_muter/live_detect.py: live terminal feedback for one speaker
commentator_muter/live_delay.py: delayed live playback plus synced detection
commentator_muter/run.py: live detection and mute pipeline
commentator_muter/speaker_id.py: embedding and profile matching
commentator_muter/audio.py: stream handling and VAD helpers
commentator_muter/config.py: runtime settings

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
commentator_muter		commentator_muter
profiles		profiles
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local Commentator Muter Prototype

How it works

Recommended environment

Setup

Enroll voices

Capture training clips from BlackHole

Bootstrap MUSAN and VoxCeleb

Run live detection and mute

Test detection with recorded voice data

Test only "Chris or not"

Live terminal feedback

Delayed Live Playback

Notes

Files

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Local Commentator Muter Prototype

How it works

Recommended environment

Setup

Enroll voices

Capture training clips from BlackHole

Bootstrap MUSAN and VoxCeleb

Run live detection and mute

Test detection with recorded voice data

Test only "Chris or not"

Live terminal feedback

Delayed Live Playback

Notes

Files

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages