Skip to content

Commit 0aac00d

Browse files
committed
Add citation info to README
1 parent d55fccb commit 0aac00d

1 file changed

Lines changed: 29 additions & 12 deletions

File tree

README.md

Lines changed: 29 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,9 @@
66

77
* **GPU accelerated forced alignment**. Uses [Pytorch's forced alignment API](https://docs.pytorch.org/audio/main/tutorials/ctc_forced_alignment_api_tutorial.html) with a GPU based implementation of the Viterbi algorithm. Enables fast and memory-efficient forced alignment of long audio segments ([Pratap et al., 2024](https://jmlr.org/papers/volume25/23-1318/23-1318.pdf#page=8)).
88
* **Flexible text normalization for improved alignment quality**. Users can supply custom regex-based text normalization functions to preprocess transcripts before alignment. A mapping from the original text to the normalized text is maintained internally. All of the applied normalizations and transformations are consequently **non-destructive and reversible after alignment**.
9-
* **Batch processing support for emission extraction**. `easyaligner` supports batched inference for wav2vec2-based models, keeping track of non-padded logits when doing alignment.
10-
* **Modular pipeline design**. The library has separate, independent, pipelines for VAD, emission extraction, and forced alignment. Users can run everything end-to-end, or run the separate stages individually.
9+
* **Batch processing support for emission extraction**. `easyaligner` supports batched inference for wav2vec2-based models, keeping track of non-padded logits when doing alignment.
10+
11+
Check out the [documentation](https://kb-labb.github.io/easyaligner/) for more details and tutorials!
1112

1213
## Installation
1314

@@ -47,13 +48,21 @@ from easyaligner.pipelines import pipeline
4748
from easyaligner.text import text_normalizer
4849
from easyaligner.vad.pyannote import load_vad_model
4950

51+
filepath_pattern = "tale-of-two-cities_align-en/taleoftwocities_01_dickens_64kb_align.mp3"
52+
53+
# Download mp3 from Hugging Face Hub
5054
snapshot_download(
5155
"Lauler/easytranscriber_tutorials",
5256
repo_type="dataset",
5357
local_dir="data/tutorials",
54-
allow_patterns="tale-of-two-cities_align-en/*",
58+
allow_patterns=filepath_pattern,
5559
)
5660

61+
# File(s) to align
62+
filepath = Path("data/tutorials") / filepath_pattern
63+
audio_dir = filepath.parent
64+
audio_files = [filepath.name]
65+
5766
text = """
5867
It was the best of times, it was the worst of times, it was the age of
5968
wisdom, it was the age of foolishness, it was the epoch of belief, it
@@ -69,26 +78,21 @@ evil, in the superlative degree of comparison only.
6978
text = text.strip()
7079

7180
# The alignments will be organized according to how the text is tokenized
72-
tokenizer = load_tokenizer(language="english") # sentence tokenizer
73-
span_list = list(tokenizer.span_tokenize(text)) # start, end character indices for each sentence
81+
tokenizer = load_tokenizer(language="english") # sentence tokenizer
82+
span_list = list(tokenizer.span_tokenize(text)) # start, end character indices for each sentence
7483
speeches = [[SpeechSegment(speech_id=0, text=text, text_spans=span_list, start=None, end=None)]]
7584

7685
# Load models and run pipeline
7786
model_vad = load_vad_model()
78-
model = (
79-
AutoModelForCTC.from_pretrained("facebook/wav2vec2-base-960h").to("cuda").half()
80-
)
87+
model = AutoModelForCTC.from_pretrained("facebook/wav2vec2-base-960h").to("cuda").half()
8188
processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-base-960h")
8289

83-
# File(s) to align
84-
audio_files = [file.name for file in Path("data/tutorials/tale-of-two-cities_align-en").glob("*")]
85-
8690
pipeline(
8791
vad_model=model_vad,
8892
emissions_model=model,
8993
processor=processor,
9094
audio_paths=audio_files,
91-
audio_dir="data/tutorials/tale-of-two-cities_align-en",
95+
audio_dir=audio_dir,
9296
speeches=speeches,
9397
alignment_strategy="speech",
9498
text_normalizer_fn=text_normalizer,
@@ -127,3 +131,16 @@ The `output/emissions` directory will, in addition to the JSON files, also conta
127131

128132
All intermediate files can safely be deleted, assuming there is no need to re-run the pipeline from a specific intermediate stage.
129133

134+
## Citation
135+
136+
If you use `easyaligner` in your research, consider citing the following blog post:
137+
138+
```
139+
@online{rekathati2026,
140+
author = {Rekathati, Faton},
141+
title = {Easyaligner: {Forced} Alignment of Text and Audio, Made Easy},
142+
date = {2026-04-08},
143+
url = {https://kb-labb.github.io/posts/2026-04-08-easyaligner/},
144+
langid = {en}
145+
}
146+
```

0 commit comments

Comments
 (0)