PhoneticXeus

Code and training recipe for PhoneticXeus, a multilingual phone recognition model using self-conditioned CTC on the XEUS speech encoder.

Setup

git clone git@github.com:changelinglab/PhoneticXeus.git
cd PhoneticXeus

# install (auto-detects x86_64 vs aarch64)
make install

# activate environment (once per session)
source .venv/bin/activate

Environment Variables

Set these before training or inference:

export IPAPACK_DATA_ROOT=/path/to/ipapack/data   # root directory for Kaldi-style data
export PHONEMIZER_ESPEAK_LIBRARY=/path/to/libespeak-ng.so  # needed for wav2vec2-phoneme models
export ESPEAK_DATA_PATH=/path/to/espeak-ng-data

Pre-trained Model

The pre-trained PhoneticXeus checkpoint is available on HuggingFace: changelinglab/PhoneticXeus

from huggingface_hub import hf_hub_download

ckpt_path = hf_hub_download("changelinglab/PhoneticXeus", "checkpoint-22000.ckpt")

Quick inference

import torch, torchaudio
from src.model.xeusphoneme.builders import build_xeus_pr_inference

inference = build_xeus_pr_inference(
    work_dir="exp/cache/xeus",
    checkpoint=ckpt_path,
    vocab_file="src/model/xeusphoneme/resources/ipa_vocab.json",
    hf_repo="espnet/xeus",
    device="cuda" if torch.cuda.is_available() else "cpu",
)

waveform, sr = torchaudio.load("audio.wav")
if sr != 16000:
    waveform = torchaudio.functional.resample(waveform, sr, 16000)

results = inference(waveform.squeeze(0))
print(results[0]["processed_transcript"])

Data Setup

Training and evaluation datasets use Kaldi-style wav.scp / text files. Dataset paths are configured in:

Training data: configs/data/ipapack_index.yaml -- defines train/dev splits
Evaluation data: configs/data/prism_pr_evalsets.yaml -- defines eval datasets (DoReCo, GMU Accent, TIMIT, Buckeye, VoxAngeles, TUSOM, FLEURS, etc.)

All paths are relative to IPAPACK_DATA_ROOT. Prepare data with the IPAPack pipeline, then point the env var to the output directory.

Pre-trained model weights are downloaded automatically from HuggingFace (e.g., espnet/xeus, espnet/powsm) on first use.

Training

# single GPU
python src/main.py experiment=train/ipapack_xeuspr trainer=gpu

# multi-GPU (DDP)
python src/main.py experiment=train/ipapack_xeuspr trainer=ddp

# SLURM
sbatch scripts/daixpr.batch experiment=train/ipapack_xeuspr run_folder=my_run

Override any parameter from the command line:

python src/main.py experiment=train/ipapack_xeuspr \
    trainer.max_steps=50000 data.batch_size=32 model.optimizer.lr=3e-5

Available training configs are in configs/experiment/train/.

Inference

Run inference on any evaluation dataset:

# single dataset
python src/main.py experiment=inference/powsmpreval data.dataset_name=doreco

# distributed (SLURM array)
sbatch --array=0-3 scripts/daixpr_inference.batch \
    experiment=inference/powsmpreval data.dataset_name=doreco

Results are written as JSONL shards: <out_file>.<task_id>.jsonl.

Available inference configs are in configs/experiment/inference/.

Evaluation

Evaluate predictions from distributed inference shards using a glob pattern:

# evaluate all shards at once
python -m src.metrics.phone_recognition \
    --prediction_file "exp/runs/my_run/transcription.*.jsonl" \
    --output_file results.csv \
    --evaluation_name my_model \
    --gt_field target --key_field utt_id

# or a single file (JSON or JSONL)
python -m src.metrics.phone_recognition \
    --prediction_file exp/runs/my_run/transcription.0.jsonl \
    --output_file results.csv \
    --evaluation_name my_model \
    --gt_field target --key_field utt_id

Metrics: PER (Phone Error Rate), PFER (Phone Feature Error Rate), FED (Feature Edit Distance), SUB/INS/DEL rates.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
docs		docs
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.project-root		.project-root
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements-dai.txt		requirements-dai.txt
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PhoneticXeus

Setup

Environment Variables

Pre-trained Model

Quick inference

Data Setup

Training

Inference

Evaluation

More Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PhoneticXeus

Setup

Environment Variables

Pre-trained Model

Quick inference

Data Setup

Training

Inference

Evaluation

More Documentation

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages