Skip to content

Commit d33b92a

Browse files
authored
Merge pull request #132 from readbeyond/devel
aeneas v1.7.0
2 parents 809c2ce + a01fb9b commit d33b92a

File tree

337 files changed

+16436
-6321
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

337 files changed

+16436
-6321
lines changed

.gitignore

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,10 @@ bak
1010
build
1111
dist
1212
docs/build
13-
venvs
1413
tmp
1514

1615
# service scripts
16+
zzz
1717
zzz_*.py
1818
zzz_*.sh
1919
zzz_long_tests

MANIFEST.in

+1
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ recursive-include aeneas/cwave *
77
recursive-include aeneas/extra *
88
prune aeneas/extra/ctw_speect
99
recursive-include aeneas/res *
10+
recursive-include aeneas/syncmap *
1011
recursive-include aeneas/tools/res *
1112
recursive-include aeneas/ttswrappers *
1213
include aeneas_check_setup.py

README.md

+61-23
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22

33
**aeneas** is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment).
44

5-
* Version: 1.6.0.1
6-
* Date: 2016-09-30
5+
* Version: 1.7.0
6+
* Date: 2016-12-07
77
* Developed by: [ReadBeyond](http://www.readbeyond.it/)
88
* Lead Developer: [Alberto Pettarin](http://www.albertopettarin.it/)
99
* License: the GNU Affero General Public License Version 3 (AGPL v3)
@@ -45,12 +45,14 @@ To eat the world's due, by the grave and thee. => [00:00:48.080, 00:00:53
4545

4646
![Waveform with aligned labels, detail](wiki/align.png)
4747

48-
This synchronization map can be output to file in several formats:
49-
EAF for research purposes,
50-
SMIL for EPUB 3,
51-
SBV/SRT/SUB/TTML/VTT for closed captioning,
52-
JSON for Web usage,
53-
or raw AUD/CSV/SSV/TSV/TXT/XML for further processing.
48+
This synchronization map can be output to file
49+
in several formats, depending on its application:
50+
51+
* research: Audacity (AUD), ELAN (EAF), TextGrid;
52+
* digital publishing: SMIL for EPUB 3;
53+
* closed captioning: SubRip (SRT), SubViewer (SBV/SUB), TTML, WebVTT (VTT);
54+
* Web: JSON;
55+
* further processing: CSV, SSV, TSV, TXT, XML.
5456

5557

5658
## System Requirements, Supported Platforms and Installation
@@ -68,12 +70,13 @@ or raw AUD/CSV/SSV/TSV/TXT/XML for further processing.
6870
### Supported Platforms
6971

7072
**aeneas** has been developed and tested on **Debian 64bit**,
71-
which is the **only supported OS** at the moment.
73+
with **Python 2.7** and **Python 3.5**,
74+
which are the **only supported platforms** at the moment.
7275
Nevertheless, **aeneas** has been confirmed to work on
73-
other Linux distributions, OS X, and Windows.
76+
other Linux distributions, Mac OS X, and Windows.
7477
See the
7578
[PLATFORMS file](https://github.com/readbeyond/aeneas/blob/master/wiki/PLATFORMS.md)
76-
for the details.
79+
for details.
7780

7881
If installing **aeneas** natively on your OS proves difficult,
7982
you are strongly encouraged to use
@@ -97,15 +100,15 @@ for detailed, step-by-step installation procedures for different operating syste
97100

98101
The generic OS-independent procedure is simple:
99102

100-
1. Install
103+
1. **Install**
101104
[Python](https://python.org/) (2.7.x preferred),
102105
[FFmpeg](https://www.ffmpeg.org/), and
103106
[eSpeak](http://espeak.sourceforge.net/)
104107

105-
2. Make sure the following executables can be called from your shell:
108+
2. Make sure the following **executables** can be called from your **shell**:
106109
`espeak`, `ffmpeg`, `ffprobe`, `pip`, and `python`
107110

108-
3. First install `numpy` with `pip` and then `aeneas`:
111+
3. First install `numpy` with `pip` and then `aeneas` (this order is important):
109112

110113
```bash
111114
pip install numpy
@@ -216,6 +219,8 @@ which explains how to use the built-in command line tools.
216219
[HOWITWORKS](https://github.com/readbeyond/aeneas/blob/master/wiki/HOWITWORKS.md)
217220
* Development history:
218221
[HISTORY](https://github.com/readbeyond/aeneas/blob/master/wiki/HISTORY.md)
222+
* Testing:
223+
[TESTING](https://github.com/readbeyond/aeneas/blob/master/wiki/TESTING.md)
219224
* Benchmark suite:
220225
[https://readbeyond.github.io/aeneas-benchmark/](https://readbeyond.github.io/aeneas-benchmark/)
221226
@@ -227,32 +232,61 @@ which explains how to use the built-in command line tools.
227232
* Text extraction from XML (e.g., XHTML) files using `id` and `class` attributes
228233
* Arbitrary text fragment granularity (single word, subphrase, phrase, paragraph, etc.)
229234
* Input audio file formats: all those readable by `ffmpeg`
230-
* Output sync map formats: AUD, CSV, EAF, JSON, SMIL, SRT, SSV, SUB, TSV, TTML, TXT, VTT, XML
231-
* Confirmed working on 37 languages: ARA, BUL, CAT, CYM, CES, DAN, DEU, ELL, ENG, EPO, EST, FAS, FIN, FRA, GLE, GRC, HRV, HUN, ISL, ITA, JPN, LAT, LAV, LIT, NLD, NOR, RON, RUS, POL, POR, SLK, SPA, SRP, SWA, SWE, TUR, UKR
235+
* Output sync map formats: AUD, CSV, EAF, JSON, SMIL, SRT, SSV, SUB, TEXTGRID, TSV, TTML, TXT, VTT, XML
236+
* Confirmed working on 38 languages: AFR, ARA, BUL, CAT, CYM, CES, DAN, DEU, ELL, ENG, EPO, EST, FAS, FIN, FRA, GLE, GRC, HRV, HUN, ISL, ITA, JPN, LAT, LAV, LIT, NLD, NOR, RON, RUS, POL, POR, SLK, SPA, SRP, SWA, SWE, TUR, UKR
232237
* MFCC and DTW computed via Python C extensions to reduce the processing time
233-
* Several built-in TTS engine wrappers: eSpeak (default), eSpeak-ng, Festival, Nuance TTS API
238+
* Several built-in TTS engine wrappers: AWS Polly TTS API, eSpeak (default), eSpeak-ng, Festival, Nuance TTS API
234239
* Default TTS (eSpeak) called via a Python C extension for fast audio synthesis
235240
* Possibility of running a custom, user-provided TTS engine Python wrapper (e.g., included example for speect)
236241
* Batch processing of multiple audio/text pairs
237242
* Download audio from a YouTube video
238243
* In multilevel mode, recursive alignment from paragraph to sentence to word level
239-
* In multilevel mode, time resolution and/or TTS engine can be specified for each level independently
244+
* In multilevel mode, MFCC resolution, MFCC masking, DTW margin, and TTS engine can be specified for each level independently
240245
* Robust against misspelled/mispronounced words, local rearrangements of words, background noise/sporadic spikes
241246
* Adjustable splitting times, including a max character/second constraint for CC applications
242247
* Automated detection of audio head/tail
243248
* Output an HTML file for fine tuning the sync map manually (`finetuneas` project)
244249
* Execution parameters tunable at runtime
245-
* Code suitable for Web app deployment (e.g., on-demand cloud computing)
246-
* Extensive test suite including 800+ unit/integration/performance tests, that run and must pass before each release
250+
* Code suitable for Web app deployment (e.g., on-demand cloud computing instances)
251+
* Extensive test suite including 1,200+ unit/integration/performance tests, that run and must pass before each release
247252
248253
249254
## Limitations and Missing Features
250255
251256
* Audio should match the text: large portions of spurious text or audio might produce a wrong sync map
252257
* Audio is assumed to be spoken: not suitable for song captioning, YMMV for CC applications
253-
* No protection against memory trashing if you feed extremely long audio files (>1.5h per single audio file)
258+
* No protection against memory swapping: be sure your amount of RAM is adequate for the maximum duration of a single audio file (e.g., 4 GB RAM => max 2h audio; 16 GB RAM => max 10h audio)
254259
* [Open issues](https://github.com/readbeyond/aeneas/issues)
255260
261+
### A Note on Word-Level Alignment
262+
263+
A significant number of users runs **aeneas** to align audio and text
264+
at word-level (i.e., each fragment is a word).
265+
Although **aeneas** was not designed with word-level alignment in mind
266+
and the results might be inferior to
267+
[ASR-based forced aligners](https://github.com/pettarin/forced-alignment-tools)
268+
for languages with good ASR models,
269+
**aeneas** offers some options to improve
270+
the quality of the alignment at word-level:
271+
272+
* multilevel text (since v1.5.1),
273+
* MFCC nonspeech masking (since v1.7.0, disabled by default),
274+
* use better TTS engines, like Festival or AWS/Nuance TTS API (since v1.5.0).
275+
276+
If you use the ``aeneas.tools.execute_task`` command line tool,
277+
you can add ``--presets-word`` switch to enable MFCC nonspeech masking, for example:
278+
279+
```bash
280+
$ python -m aeneas.tools.execute_task --example-words --presets-word
281+
$ python -m aeneas.tools.execute_task --example-words-multilevel --presets-word
282+
```
283+
284+
If you use **aeneas** as a library, just set the appropriate
285+
``RuntimeConfiguration`` parameters.
286+
Please see the
287+
[command line tutorial](http://www.readbeyond.it/aeneas/docs/clitutorial.html)
288+
for details.
289+
256290
257291
## License
258292
@@ -282,6 +316,8 @@ No copy rights were harmed in the making of this project.
282316
283317
* **April 2016**: the Fruch Foundation kindly sponsored the development and documentation of v1.5.0
284318
319+
* **December 2016**: the [Centro Internazionale Del Libro Parlato "Adriano Sernagiotto"](http://www.libroparlato.org/) (Feltre, Italy) partially sponsored the development of v1.7.0
320+
285321
### Supporting
286322
287323
Would you like supporting the development of **aeneas**?
@@ -291,8 +327,7 @@ I accept sponsorships to
291327
* fix bugs,
292328
* add new features,
293329
* improve the quality and the performance of the code,
294-
* port the code to other languages/platforms,
295-
* support of third party installations, and
330+
* port the code to other languages/platforms, and
296331
* improve the documentation.
297332
298333
Feel free to
@@ -341,6 +376,9 @@ packaged the installers for Mac OS X and Windows.
341376
**Firat Ozdemir** contributed the `finetuneas`
342377
HTML/JS code for fine tuning sync maps in the browser.
343378
379+
**Willem van der Walt** contributed the code snippet
380+
to output a sync map in TextGrid format.
381+
344382
All the mighty
345383
[GitHub contributors](https://github.com/readbeyond/aeneas/graphs/contributors),
346384
and the members of the

README.rst

+78-31
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@ aeneas
44
**aeneas** is a Python/C library and a set of tools to automagically
55
synchronize audio and text (aka forced alignment).
66

7-
- Version: 1.6.0.1
8-
- Date: 2016-09-30
7+
- Version: 1.7.0
8+
- Date: 2016-12-07
99
- Developed by: `ReadBeyond <http://www.readbeyond.it/>`__
1010
- Lead Developer: `Alberto Pettarin <http://www.albertopettarin.it/>`__
1111
- License: the GNU Affero General Public License Version 3 (AGPL v3)
@@ -58,10 +58,15 @@ interval in the audio file:
5858

5959
Waveform with aligned labels, detail
6060

61-
This synchronization map can be output to file in several formats: EAF
62-
for research purposes, SMIL for EPUB 3, SBV/SRT/SUB/TTML/VTT for closed
63-
captioning, JSON for Web usage, or raw AUD/CSV/SSV/TSV/TXT/XML for
64-
further processing.
61+
This synchronization map can be output to file in several formats,
62+
depending on its application:
63+
64+
- research: Audacity (AUD), ELAN (EAF), TextGrid;
65+
- digital publishing: SMIL for EPUB 3;
66+
- closed captioning: SubRip (SRT), SubViewer (SBV/SUB), TTML, WebVTT
67+
(VTT);
68+
- Web: JSON;
69+
- further processing: CSV, SSV, TSV, TXT, XML.
6570

6671
System Requirements, Supported Platforms and Installation
6772
---------------------------------------------------------
@@ -82,12 +87,13 @@ System Requirements
8287
Supported Platforms
8388
~~~~~~~~~~~~~~~~~~~
8489

85-
**aeneas** has been developed and tested on **Debian 64bit**, which is
86-
the **only supported OS** at the moment. Nevertheless, **aeneas** has
87-
been confirmed to work on other Linux distributions, OS X, and Windows.
88-
See the `PLATFORMS
90+
**aeneas** has been developed and tested on **Debian 64bit**, with
91+
**Python 2.7** and **Python 3.5**, which are the **only supported
92+
platforms** at the moment. Nevertheless, **aeneas** has been confirmed
93+
to work on other Linux distributions, Mac OS X, and Windows. See the
94+
`PLATFORMS
8995
file <https://github.com/readbeyond/aeneas/blob/master/wiki/PLATFORMS.md>`__
90-
for the details.
96+
for details.
9197

9298
If installing **aeneas** natively on your OS proves difficult, you are
9399
strongly encouraged to use
@@ -110,14 +116,16 @@ operating systems.
110116

111117
The generic OS-independent procedure is simple:
112118

113-
1. Install `Python <https://python.org/>`__ (2.7.x preferred),
119+
1. **Install** `Python <https://python.org/>`__ (2.7.x preferred),
114120
`FFmpeg <https://www.ffmpeg.org/>`__, and
115121
`eSpeak <http://espeak.sourceforge.net/>`__
116122

117-
2. Make sure the following executables can be called from your shell:
118-
``espeak``, ``ffmpeg``, ``ffprobe``, ``pip``, and ``python``
123+
2. Make sure the following **executables** can be called from your
124+
**shell**: ``espeak``, ``ffmpeg``, ``ffprobe``, ``pip``, and
125+
``python``
119126

120-
3. First install ``numpy`` with ``pip`` and then ``aeneas``:
127+
3. First install ``numpy`` with ``pip`` and then ``aeneas`` (this order
128+
is important):
121129

122130
.. code:: bash
123131
@@ -219,6 +227,8 @@ Documentation and Support
219227
`HOWITWORKS <https://github.com/readbeyond/aeneas/blob/master/wiki/HOWITWORKS.md>`__
220228
- Development history:
221229
`HISTORY <https://github.com/readbeyond/aeneas/blob/master/wiki/HISTORY.md>`__
230+
- Testing:
231+
`TESTING <https://github.com/readbeyond/aeneas/blob/master/wiki/TESTING.md>`__
222232
- Benchmark suite: https://readbeyond.github.io/aeneas-benchmark/
223233

224234
Supported Features
@@ -234,15 +244,15 @@ Supported Features
234244
paragraph, etc.)
235245
- Input audio file formats: all those readable by ``ffmpeg``
236246
- Output sync map formats: AUD, CSV, EAF, JSON, SMIL, SRT, SSV, SUB,
237-
TSV, TTML, TXT, VTT, XML
238-
- Confirmed working on 37 languages: ARA, BUL, CAT, CYM, CES, DAN, DEU,
239-
ELL, ENG, EPO, EST, FAS, FIN, FRA, GLE, GRC, HRV, HUN, ISL, ITA, JPN,
240-
LAT, LAV, LIT, NLD, NOR, RON, RUS, POL, POR, SLK, SPA, SRP, SWA, SWE,
241-
TUR, UKR
247+
TEXTGRID, TSV, TTML, TXT, VTT, XML
248+
- Confirmed working on 38 languages: AFR, ARA, BUL, CAT, CYM, CES, DAN,
249+
DEU, ELL, ENG, EPO, EST, FAS, FIN, FRA, GLE, GRC, HRV, HUN, ISL, ITA,
250+
JPN, LAT, LAV, LIT, NLD, NOR, RON, RUS, POL, POR, SLK, SPA, SRP, SWA,
251+
SWE, TUR, UKR
242252
- MFCC and DTW computed via Python C extensions to reduce the
243253
processing time
244-
- Several built-in TTS engine wrappers: eSpeak (default), eSpeak-ng,
245-
Festival, Nuance TTS API
254+
- Several built-in TTS engine wrappers: AWS Polly TTS API, eSpeak
255+
(default), eSpeak-ng, Festival, Nuance TTS API
246256
- Default TTS (eSpeak) called via a Python C extension for fast audio
247257
synthesis
248258
- Possibility of running a custom, user-provided TTS engine Python
@@ -251,8 +261,8 @@ Supported Features
251261
- Download audio from a YouTube video
252262
- In multilevel mode, recursive alignment from paragraph to sentence to
253263
word level
254-
- In multilevel mode, time resolution and/or TTS engine can be
255-
specified for each level independently
264+
- In multilevel mode, MFCC resolution, MFCC masking, DTW margin, and
265+
TTS engine can be specified for each level independently
256266
- Robust against misspelled/mispronounced words, local rearrangements
257267
of words, background noise/sporadic spikes
258268
- Adjustable splitting times, including a max character/second
@@ -261,9 +271,9 @@ Supported Features
261271
- Output an HTML file for fine tuning the sync map manually
262272
(``finetuneas`` project)
263273
- Execution parameters tunable at runtime
264-
- Code suitable for Web app deployment (e.g., on-demand cloud
265-
computing)
266-
- Extensive test suite including 800+ unit/integration/performance
274+
- Code suitable for Web app deployment (e.g., on-demand cloud computing
275+
instances)
276+
- Extensive test suite including 1,200+ unit/integration/performance
267277
tests, that run and must pass before each release
268278

269279
Limitations and Missing Features
@@ -273,10 +283,41 @@ Limitations and Missing Features
273283
might produce a wrong sync map
274284
- Audio is assumed to be spoken: not suitable for song captioning, YMMV
275285
for CC applications
276-
- No protection against memory trashing if you feed extremely long
277-
audio files (>1.5h per single audio file)
286+
- No protection against memory swapping: be sure your amount of RAM is
287+
adequate for the maximum duration of a single audio file (e.g., 4 GB
288+
RAM => max 2h audio; 16 GB RAM => max 10h audio)
278289
- `Open issues <https://github.com/readbeyond/aeneas/issues>`__
279290

291+
A Note on Word-Level Alignment
292+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
293+
294+
A significant number of users runs **aeneas** to align audio and text at
295+
word-level (i.e., each fragment is a word). Although **aeneas** was not
296+
designed with word-level alignment in mind and the results might be
297+
inferior to `ASR-based forced
298+
aligners <https://github.com/pettarin/forced-alignment-tools>`__ for
299+
languages with good ASR models, **aeneas** offers some options to
300+
improve the quality of the alignment at word-level:
301+
302+
- multilevel text (since v1.5.1),
303+
- MFCC nonspeech masking (since v1.7.0, disabled by default),
304+
- use better TTS engines, like Festival or AWS/Nuance TTS API (since
305+
v1.5.0).
306+
307+
If you use the ``aeneas.tools.execute_task`` command line tool, you can
308+
add ``--presets-word`` switch to enable MFCC nonspeech masking, for
309+
example:
310+
311+
.. code:: bash
312+
313+
$ python -m aeneas.tools.execute_task --example-words --presets-word
314+
$ python -m aeneas.tools.execute_task --example-words-multilevel --presets-word
315+
316+
If you use **aeneas** as a library, just set the appropriate
317+
``RuntimeConfiguration`` parameters. Please see the `command line
318+
tutorial <http://www.readbeyond.it/aeneas/docs/clitutorial.html>`__ for
319+
details.
320+
280321
License
281322
-------
282323

@@ -316,6 +357,10 @@ Sponsors
316357
- **April 2016**: the Fruch Foundation kindly sponsored the development
317358
and documentation of v1.5.0
318359

360+
- **December 2016**: the `Centro Internazionale Del Libro Parlato
361+
"Adriano Sernagiotto" <http://www.libroparlato.org/>`__ (Feltre,
362+
Italy) partially sponsored the development of v1.7.0
363+
319364
Supporting
320365
~~~~~~~~~~
321366

@@ -326,8 +371,7 @@ I accept sponsorships to
326371
- fix bugs,
327372
- add new features,
328373
- improve the quality and the performance of the code,
329-
- port the code to other languages/platforms,
330-
- support of third party installations, and
374+
- port the code to other languages/platforms, and
331375
- improve the documentation.
332376

333377
Feel free to `get in touch <mailto:[email protected]>`__.
@@ -371,6 +415,9 @@ the installers for Mac OS X and Windows.
371415
**Firat Ozdemir** contributed the ``finetuneas`` HTML/JS code for fine
372416
tuning sync maps in the browser.
373417

418+
**Willem van der Walt** contributed the code snippet to output a sync
419+
map in TextGrid format.
420+
374421
All the mighty `GitHub
375422
contributors <https://github.com/readbeyond/aeneas/graphs/contributors>`__,
376423
and the members of the `Google

VERSION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.6.0
1+
1.7.0

0 commit comments

Comments
 (0)