Transcription, Forced Alignment and Diarization for Pangloss Collection

This repository provides tools for forced alignment, segmentation, and transcription of Pangloss XML-annotated audio data, using transformers and pyannote pipelines.

Features

Transcription: Transcribes audio or segments using a pretrained Wav2Vec2 model and diarization.
Forced Alignment: Aligns words in Pangloss XML files with corresponding audio segments and writes word-level timestamps back to XML.
Segmentation: Splits audio into speech segments using Speech Activity Detection (SAD).
XML Parsing & Audio Chunk Extraction: Extracts sentences and corresponding audio chunks from Pangloss XML.

Installation

Linux / macOS

To set up the environment and install the package, follow these steps:

chmod +x setup.sh
./setup.sh

Windows :

Accept pyannote/segmentation-3.0, pyannote/speaker-diarization-3.1, and pyannote/voice-activity-detection on Hugging Face.
Create an access token at hf.co/settings/tokens.
Set your Hugging Face token in the environment variable:
```
$env:HF_TOKEN="your_token_here"
```
or use the --token argument in the CLI commands.
Install dependencies in a virtual environment using pixi:
```
pixi install
```
Install the package
This allows you to use the package as a CLI tool and import it in Python scripts.

pixi run python -m ensurepip --upgrade
pixi run python -m pip install --upgrade pip
pixi run pip install -e .

Download or prepare WAV files (and Pangloss XML files) and place them in the data/ directory (recommended).
Download or train a Wav2Vec2 model and use it in the --model argument, place ir in models/ directory (recommended).

Usage

All main features are available as CLI commands thanks to the package structure and [project.scripts] entry points.
You do not need to use python ... directly.
Use the following commands from your project root (or with pixi run ...):

Transcribe and segment long audio files and get TextGrid

Uses diarization and ASR to create a TextGrid file handling multiple speaker tiers and distinguishing between human voice and silence/noise.

pixi run transcribe --model models/Na_best_model --audio_path data/235213.wav --num_speakers 1

Outputs a transcribed TextGrid file.

Forced Alignment and Word-Level Timestamps

pixi run word_align --pangloss_xml data/235213.xml --wav data/235213.wav --model models/Na_best_model

Outputs an aligned Pangloss XML file with word-level <AUDIO start="..." end="..."/> tags.

Transcribe short audio files and get a .txt file

Simplistic transcription of short audio files using a pretrained Wav2Vec2 model.

pixi run simple_predict data/235213.wav --model models/Na_best_model

Outputs data/235213.txt containing the transcription.
Does not handle multiple speakers.

For more details, see the docstrings in each module or run:

pixi run transcribe --help
pixi run word_align --help
pixi run simple_predict --help

Notes

All package code is in src/nlp_pangloss/.
CLI commands are defined in the [project.scripts] section of pyproject.toml.
For advanced usage or development, you can still run scripts directly with:
```
pixi run python src/nlp_pangloss/segment_and_transcribe.py ...
```
but this is not necessary for typical use.

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
data		data
models		models
src/nlp_pangloss		src/nlp_pangloss
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.test.txt.kate-swp		.test.txt.kate-swp
LICENSE		LICENSE
README.md		README.md
pixi.lock		pixi.lock
pyproject.toml		pyproject.toml
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transcription, Forced Alignment and Diarization for Pangloss Collection

Features

Installation

Linux / macOS

Windows :

Usage

Transcribe and segment long audio files and get TextGrid

Forced Alignment and Word-Level Timestamps

Transcribe short audio files and get a .txt file

Notes

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

rfclara/nlp_pangloss

Folders and files

Latest commit

History

Repository files navigation

Transcription, Forced Alignment and Diarization for Pangloss Collection

Features

Installation

Linux / macOS

Windows :

Usage

Transcribe and segment long audio files and get TextGrid

Forced Alignment and Word-Level Timestamps

Transcribe short audio files and get a .txt file

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages