Beamforming-LLM: Semantic Spatial Recall of Missed Conversations

Beamforming-LLM is a research prototype that enables users to semantically recall conversations they may have missed in multi-speaker environments. It combines spatial audio processing using beamforming with Whisper ASR and Retrieval-Augmented Generation (RAG) via large language models (LLMs). Users can query the system in natural language to understand what was said in overlapping conversations.

🧾 What the System Outputs

The system produces:

✅ Contrastive natural language summaries of attended vs. missed conversations
📍 Direction metadata (e.g., left, right, front) indicating spatial origin of speech
⏱️ Timestamps for each conversation segment
🔊 Audio playback snippets for both target and non-target streams

This enables intuitive memory-like re-engagement with multi-party conversations, making it useful for meetings, social settings, and assistive recall applications.

🧠 Paper Overview

In real-world social or professional settings—such as dinner tables, meetings, or poster sessions—we often focus on one conversation and miss others happening simultaneously. Beamforming-LLM captures and separates these spatial conversations and enables natural language recall of what was missed.

Key technologies:

Beamforming for directional audio separation
Whisper ASR for transcription and timestamping
LLM-RAG for semantic retrieval and summarization (via GPT-4o-mini)
FAISS for fast vector-based retrieval

📁 Notebooks

`1. Beamforming.ipynb`

Fits beamforming filters and separates multi-channel audio into spatially distinct .wav files using:

MVDR beamforming
Direction of Arrival (DOA) estimation via Pyroomacoustics

`2. Beamforming-LLM.ipynb`

Integrates the full pipeline:

Transcribes separated audio with Whisper
Chunks and embeds text with sentence transformers
Stores embeddings in FAISS
Accepts natural language queries
Retrieves and filters relevant segments using GPT-4o-mini
Summarizes what was missed based on temporal-spatial alignment

`3. Evaluation.ipynb`

Quantitatively evaluates the effectiveness of beamforming using:

PESQ (Perceptual Evaluation of Speech Quality)
STOI (Short-Time Objective Intelligibility)
Comparison of scores before and after beamforming

📄 Citation & Preprint

This work is described in the following preprint:

Beamforming-LLM: What, Where and When Did I Miss?
Vishal Choudhari
arXiv preprint, 2025.

If you use or build on this work, please cite it as:

@article{choudhari2025beamformingllm,
      title={Beamforming-LLM: What, Where and When Did I Miss?}, 
      author={Vishal Choudhari},
      year={2025},
      eprint={2509.06221},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/2509.06221}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
A_Beamforming-v3.ipynb		A_Beamforming-v3.ipynb
B_Beamforming-LLM.ipynb		B_Beamforming-LLM.ipynb
C_Evaluation.ipynb		C_Evaluation.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Beamforming-LLM: Semantic Spatial Recall of Missed Conversations

🧾 What the System Outputs

🧠 Paper Overview

📁 Notebooks

`1. Beamforming.ipynb`

`2. Beamforming-LLM.ipynb`

`3. Evaluation.ipynb`

📄 Citation & Preprint

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Beamforming-LLM: Semantic Spatial Recall of Missed Conversations

🧾 What the System Outputs

🧠 Paper Overview

📁 Notebooks

1. Beamforming.ipynb

2. Beamforming-LLM.ipynb

3. Evaluation.ipynb

📄 Citation & Preprint

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`1. Beamforming.ipynb`

`2. Beamforming-LLM.ipynb`

`3. Evaluation.ipynb`

Packages