Skip to content

jackg22/MuteMe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Local Commentator Muter Prototype

This is a local Python prototype for detecting known voices, such as NFL commentators, and muting them in near real time.

It is a prototype, not a polished product. The focus is:

  • enroll target voices from audio clips
  • test detection accuracy against recorded clips
  • listen to live audio from an input device
  • compare short speech windows to enrolled voice profiles
  • mute output when a blocked speaker is active

How it works

The pipeline is:

  1. Capture audio from a local input device
  2. Run voice activity detection to skip silence and low-speech chunks
  3. Generate a speaker embedding for each speech chunk
  4. Compare that embedding to saved speaker profiles
  5. Apply smoothing rules before muting or unmuting

The speaker recognition model uses SpeechBrain's pretrained ECAPA embedding model.

Recommended environment

Use Python 3.11 or 3.12.

Many audio ML packages are not yet reliable on Python 3.14, so using a virtual environment on 3.11/3.12 will save you time.

Setup

  1. Install Python 3.11 or 3.12
  2. Create a virtual environment
  3. Install dependencies
python3.11 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

If pyaudio fails to install on macOS, install PortAudio first:

brew install portaudio

Enroll voices

Create one WAV file per commentator you want to detect. Cleaner speech gives better results.

Suggested training data per speaker:

  • at least 5-10 minutes of mostly clean speech
  • multiple clips from different games
  • minimal crowd noise when possible

Then enroll:

python -m commentator_muter.enroll \
  --name joe_buck \
  --audio samples/joe_buck_1.wav samples/joe_buck_2.wav

This creates a profile in profiles/joe_buck.json.

Capture training clips from BlackHole

If you want to build a better dataset directly from routed game audio, you can record labeled clips from a live input such as BlackHole 2ch.

List devices first:

python -m commentator_muter.capture_training_audio --list-devices

Then record clips for one label:

python -m commentator_muter.capture_training_audio \
  --input-device 1 \
  --label chris

The recorder will:

  • keep listening on the selected input device
  • start a take when you press Enter
  • stop the take when you press Enter again
  • show clip duration, RMS, and RNNoise speech probability
  • let you save, discard, or redo the take

By default, clips are written to:

samples/captured/<label>/

You can point it somewhere else if you want to record directly into a training folder:

python -m commentator_muter.capture_training_audio \
  --input-device 1 \
  --label mike \
  --output-dir samples

Bootstrap MUSAN and VoxCeleb

You can also pull in a starter subset of Hugging Face datasets for noise negatives and speaker metadata.

This command downloads:

  • a starter subset of MUSAN noise wav files
  • VoxCeleb metadata files
  • optionally the smaller vox1_test_wav.zip archive if you want some VoxCeleb audio right away
python -m commentator_muter.bootstrap_hf_datasets

By default it writes to:

external_datasets/

If you want a larger MUSAN starter pack:

python -m commentator_muter.bootstrap_hf_datasets --musan-noise-count 250

If you want the optional VoxCeleb test audio archive too:

python -m commentator_muter.bootstrap_hf_datasets --include-vox1-test-audio

Notes:

  • MUSAN is easy to use as noise/background negatives.
  • The full VoxCeleb archives are very large, so this bootstrap command stays conservative by default.

Run live detection and mute

List devices first if needed:

python -m commentator_muter.run --list-devices

Then start the prototype:

python -m commentator_muter.run \
  --input-device 0 \
  --output-device 1 \
  --block joe_buck \
  --threshold 0.72

Test detection with recorded voice data

Before wiring this into another app, you can test the speaker detector by running it on saved clips.

Example:

python -m commentator_muter.detect \
  --audio tests/joe_clip.wav tests/troy_clip.wav \
  --threshold 0.72

For each speech window, the tool prints:

  • the timestamp inside the clip
  • the best speaker match
  • the best score
  • the top profile scores for comparison

Then it prints a clip-level summary showing:

  • how many windows cleared the threshold
  • which speaker dominated the clip
  • how often each speaker won

If you want just summaries:

python -m commentator_muter.detect \
  --audio tests/*.wav \
  --summary-only

If you want machine-readable output:

python -m commentator_muter.detect \
  --audio tests/*.wav \
  --json-output

Test only "Chris or not"

If you only care about one commentator right now, enroll just that voice and run the binary detector.

Example:

python -m commentator_muter.enroll \
  --name chris \
  --audio samples/chris_1.wav samples/chris_2.wav

python -m commentator_muter.detect_binary \
  --target chris \
  --audio tests/chris_clip.wav tests/other_clip.wav \
  --threshold 0.72

This prints a simple result for each window:

  • CHRIS
  • NOT_CHRIS
  • similarity score

Then it prints a clip-level summary with:

  • final decision
  • max score
  • average score
  • accepted windows over total speech windows

Live terminal feedback

You can also run the detector live and have it print whether the active voice sounds like Chris.

List devices:

python -m commentator_muter.live_detect --list-devices

Then monitor a microphone or routed audio input:

python -m commentator_muter.live_detect \
  --target chris \
  --input-device 0

It will print one of:

  • CHRIS
  • CHRIS_HOLD
  • NOT_CHRIS
  • NO_SPEECH

Delayed Live Playback

If you want the audio output to line up with the detection timing, use the delayed live command.

It:

  • reads live audio from an input device
  • buffers it for a configurable delay
  • plays the delayed audio to an output device
  • runs detection against that same delayed timeline

Example:

python -m commentator_muter.live_delay \
  --target chris \
  --input-device 1 \
  --output-device 3 \
  --delay-seconds 1.5

This is the first step toward a synced “delay, detect, then act” architecture for muting.

The live detector now uses a sticky threshold:

  • Chris must cross the normal threshold to enter CHRIS
  • for a short time after a strong match, the detector uses a slightly lower threshold
  • that reduces flicker during crowd noise or weaker speech windows
  • it also keeps a short decision window so overlapping or noisy chunks are judged with recent context instead of one chunk at a time
  • before scoring, it now runs stronger speech enhancement and a target-focused extraction step that keeps the subsegments most likely to sound like Chris

You can tune it with:

python -m commentator_muter.live_detect \
  --target chris \
  --input-device 0 \
  --extraction-subwindow-seconds 0.35 \
  --extraction-subwindow-stride-seconds 0.175 \
  --extraction-keep-ratio 0.5 \
  --extraction-min-score 0.28 \
  --decision-window-seconds 1.6 \
  --min-positive-windows 2 \
  --strong-positive-score 0.58 \
  --sticky-hold-seconds 2.0 \
  --sticky-threshold-drop 0.05 \
  --negative-chunks-to-release 3

Notes

  • Start with one blocked speaker first
  • Use volume ducking before full mute if the hard cut feels too abrupt
  • Expect some latency, usually around 1-2s in prototype form with the delayed decision window
  • Overlapping speech and loud crowd noise are the hardest cases

Files

  • commentator_muter/enroll.py: build speaker profiles
  • commentator_muter/detect.py: test speaker detection against recorded audio
  • commentator_muter/detect_binary.py: test one speaker as yes-or-no
  • commentator_muter/live_detect.py: live terminal feedback for one speaker
  • commentator_muter/live_delay.py: delayed live playback plus synced detection
  • commentator_muter/run.py: live detection and mute pipeline
  • commentator_muter/speaker_id.py: embedding and profile matching
  • commentator_muter/audio.py: stream handling and VAD helpers
  • commentator_muter/config.py: runtime settings

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages