4
4
**aeneas ** is a Python/C library and a set of tools to automagically
5
5
synchronize audio and text (aka forced alignment).
6
6
7
- - Version: 1.4.1
8
- - Date: 2016-02-13
7
+ - Version: 1.5.0
8
+ - Date: 2016-04-02
9
9
- Developed by: `ReadBeyond <http://www.readbeyond.it/ >`__
10
10
- Lead Developer: `Alberto Pettarin <http://www.albertopettarin.it/ >`__
11
11
- License: the GNU Affero General Public License Version 3 (AGPL v3)
12
12
13
13
- Quick Links: `Home <http://www.readbeyond.it/aeneas/ >`__ -
14
14
`GitHub <https://github.com/readbeyond/aeneas/ >`__ -
15
- `PyPI <https://pypi.python.org/pypi/aeneas/ >`__ - `API
16
- Docs <http://www.readbeyond.it/aeneas/docs/> `__ - `Mailing
15
+ `PyPI <https://pypi.python.org/pypi/aeneas/ >`__ -
16
+ `Docs <http://www.readbeyond.it/aeneas/docs/ >`__ -
17
+ `Tutorial <http://www.readbeyond.it/aeneas/docs/clitutorial.html >`__
18
+ - `Mailing
17
19
List <https://groups.google.com/d/forum/aeneas-forced-alignment> `__ -
18
20
`Web App <http://aeneasweb.org >`__
19
21
@@ -34,25 +36,31 @@ interval in the audio file:
34
36
35
37
::
36
38
37
- 1 => [00:00:00.000, 00:00:02.680]
38
- From fairest creatures we desire increase, => [00:00:02.680, 00:00:05.480]
39
- That thereby beauty's rose might never die, => [00:00:05.480, 00:00:08.640]
40
- But as the riper should by time decease, => [00:00:08.640, 00:00:11.960]
41
- His tender heir might bear his memory: => [00:00:11.960, 00:00:15.280]
42
- But thou contracted to thine own bright eyes, => [00:00:15.280, 00:00:18.520]
43
- Feed'st thy light's flame with self-substantial fuel, => [00:00:18.520, 00:00:22.760]
44
- Making a famine where abundance lies, => [00:00:22.760, 00:00:25.720]
45
- Thy self thy foe, to thy sweet self too cruel: => [00:00:25.720, 00:00:31.240]
46
- Thou that art now the world's fresh ornament, => [00:00:31.240, 00:00:34.280]
47
- And only herald to the gaudy spring, => [00:00:34.280, 00:00:36.960]
48
- Within thine own bud buriest thy content, => [00:00:36.960, 00:00:40.640]
49
- And tender churl mak'st waste in niggarding: => [00:00:40.640, 00:00:43.600]
50
- Pity the world, or else this glutton be, => [00:00:43.600, 00:00:48.000]
51
- To eat the world's due, by the grave and thee. => [00:00:48.000, 00:00:53.280]
52
-
53
- This synchronization map can be output to file in several formats: SMIL
54
- for EPUB 3, SBV/SRT/SUB/TTML/VTT for closed captioning, JSON/RBSE for
55
- Web usage, or raw CSV/SSV/TSV/TXT/XML for further processing.
39
+ 1 => [00:00:00.000, 00:00:02.640]
40
+ From fairest creatures we desire increase, => [00:00:02.640, 00:00:05.880]
41
+ That thereby beauty's rose might never die, => [00:00:05.880, 00:00:09.240]
42
+ But as the riper should by time decease, => [00:00:09.240, 00:00:11.920]
43
+ His tender heir might bear his memory: => [00:00:11.920, 00:00:15.280]
44
+ But thou contracted to thine own bright eyes, => [00:00:15.280, 00:00:18.800]
45
+ Feed'st thy light's flame with self-substantial fuel, => [00:00:18.800, 00:00:22.760]
46
+ Making a famine where abundance lies, => [00:00:22.760, 00:00:25.680]
47
+ Thy self thy foe, to thy sweet self too cruel: => [00:00:25.680, 00:00:31.240]
48
+ Thou that art now the world's fresh ornament, => [00:00:31.240, 00:00:34.400]
49
+ And only herald to the gaudy spring, => [00:00:34.400, 00:00:36.920]
50
+ Within thine own bud buriest thy content, => [00:00:36.920, 00:00:40.640]
51
+ And tender churl mak'st waste in niggarding: => [00:00:40.640, 00:00:43.640]
52
+ Pity the world, or else this glutton be, => [00:00:43.640, 00:00:48.080]
53
+ To eat the world's due, by the grave and thee. => [00:00:48.080, 00:00:53.240]
54
+
55
+ .. figure :: wiki/align.png
56
+ :alt: Waveform with aligned labels, detail
57
+
58
+ Waveform with aligned labels, detail
59
+
60
+ This synchronization map can be output to file in several formats: EAF
61
+ for research purposes, SMIL for EPUB 3, SBV/SRT/SUB/TTML/VTT for closed
62
+ captioning, JSON for Web usage, or raw AUD/CSV/SSV/TSV/TXT/XML for
63
+ further processing.
56
64
57
65
System Requirements, Supported Platforms and Installation
58
66
---------------------------------------------------------
@@ -66,20 +74,17 @@ System Requirements
66
74
3. `FFmpeg <https://www.ffmpeg.org/ >`__
67
75
4. `eSpeak <http://espeak.sourceforge.net/ >`__
68
76
5. Python modules ``BeautifulSoup4 ``, ``lxml ``, and ``numpy ``
69
- 6. Python C headers to compile the Python C extensions (Optional but
77
+ 6. Python C headers to compile the Python C extensions (optional but
70
78
strongly recommended)
71
- 7. A shell supporting UTF-8 (Optional but strongly recommended)
72
- 8. Python module ``pafy `` (Optional, only required if you want to
73
- download audio from YouTube)
79
+ 7. A shell supporting UTF-8 (optional but strongly recommended)
74
80
75
81
Supported Platforms
76
82
~~~~~~~~~~~~~~~~~~~
77
83
78
84
**aeneas ** has been developed and tested on **Debian 64bit **, which is
79
- the **only supported OS ** at the moment.
80
-
81
- However, **aeneas ** has been confirmed to work on other Linux
82
- distributions, OS X, and Windows. See the `PLATFORMS
85
+ the **only supported OS ** at the moment. Nevertheless, **aeneas ** has
86
+ been confirmed to work on other Linux distributions, OS X, and Windows.
87
+ See the `PLATFORMS
83
88
file <https://github.com/readbeyond/aeneas/blob/master/wiki/PLATFORMS.md> `__
84
89
for the details.
85
90
@@ -115,37 +120,45 @@ for detailed, step-by-step procedures for Linux, OS X, and Windows.
115
120
Usage
116
121
-----
117
122
118
- 1. To check that you installed `` aeneas `` correctly, run:
123
+ 1. To ** check ** whether you installed ** aeneas ** correctly, run:
119
124
120
125
``bash python -m aeneas.diagnostics ``
121
126
122
- 2. Run ``execute_task `` or ``execute_job `` with ``-h `` (resp.,
123
- ``--help ``) to get a short (resp., long) usage message:
127
+ 2. Run without arguments to get the **usage message **:
124
128
125
129
.. code :: bash
126
130
127
- python -m aeneas.tools.execute_task -h
128
- python -m aeneas.tools.execute_job -h
131
+ python -m aeneas.tools.execute_task
132
+ python -m aeneas.tools.execute_job
133
+
134
+ You can also get a list of **live examples ** that you can immediately
135
+ run on your machine thanks to the included files:
129
136
130
- The above commands also print a list of live usage examples that you
131
- can immediately run on your machine, thanks to the included example
132
- files.
137
+ .. code :: bash
133
138
134
- 3. To compute a synchronization map ``map.json `` for a pair
139
+ python -m aeneas.tools.execute_task --examples
140
+ python -m aeneas.tools.execute_task --examples-all
141
+
142
+ 3. To **compute a synchronization map ** ``map.json `` for a pair
135
143
(``audio.mp3 ``, ``text.txt `` in
136
- `` ` plain`` <http://www.readbeyond.it/aeneas/docs/textfile.html#aeneas.textfile.TextFileFormat.PLAIN>`__
144
+ `plain <http://www.readbeyond.it/aeneas/docs/textfile.html#aeneas.textfile.TextFileFormat.PLAIN >`__
137
145
text format), you can run:
138
146
139
147
.. code :: bash
140
148
141
149
python -m aeneas.tools.execute_task \
142
150
audio.mp3 \
143
151
text.txt \
144
- " task_language=en |os_task_file_format=json|is_text_type=plain" \
152
+ " task_language=eng |os_task_file_format=json|is_text_type=plain" \
145
153
map.json
146
154
147
- To compute a synchronization map ``map.smil `` for a pair (``audio.mp3 ``,
148
- ```page.xhtml `` <http://www.readbeyond.it/aeneas/docs/textfile.html#aeneas.textfile.TextFileFormat.UNPARSED>`__
155
+ (The command has been split into lines with ``\ `` for visual clarity; in
156
+ production you can have the entire command on a single line and/or you
157
+ can use shell variables.)
158
+
159
+ To **compute a synchronization map ** ``map.smil `` for a pair
160
+ (``audio.mp3 ``,
161
+ `page.xhtml <http://www.readbeyond.it/aeneas/docs/textfile.html#aeneas.textfile.TextFileFormat.UNPARSED >`__
149
162
containing fragments marked by ``id `` attributes like ``f001 ``), you can
150
163
run:
151
164
@@ -155,80 +168,89 @@ run:
155
168
python -m aeneas.tools.execute_task \
156
169
audio.mp3 \
157
170
page.xhtml \
158
- "task_language=en |os_task_file_format=smil|os_task_file_smil_audio_ref=audio.mp3|os_task_file_smil_page_ref=page.xhtml|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric" \
171
+ "task_language=eng |os_task_file_format=smil|os_task_file_smil_audio_ref=audio.mp3|os_task_file_smil_page_ref=page.xhtml|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric" \
159
172
map.smil
160
173
```
161
174
162
- The third parameter (the *configuration string *) can specify several
163
- other parameters/options. See the
175
+ As you can see, the third argument (the *configuration string *)
176
+ specifies the parameters controlling the I/O formats and the processing
177
+ options for the task. Consult the
164
178
`documentation <http://www.readbeyond.it/aeneas/docs/ >`__ for details.
165
179
166
- 4. If you have several tasks to process, you can create a job container
167
- and a configuration file, to process them all at once :
180
+ 4. If you have several tasks to process, you can create a ** job
181
+ container ** to batch process them:
168
182
169
183
.. code :: bash
170
184
171
185
python -m aeneas.tools.execute_job job.zip output_directory
172
186
173
187
File ``job.zip `` should contain a ``config.txt `` or ``config.xml ``
174
188
configuration file, providing **aeneas ** with all the information needed
175
- to parse the input assets and format the output sync map files. See the
176
- `documentation <http://www.readbeyond.it/aeneas/docs/ >`__ for details.
189
+ to parse the input assets and format the output sync map files. Consult
190
+ the `documentation <http://www.readbeyond.it/aeneas/docs/ >`__ for
191
+ details.
177
192
178
- The `documentation <http://www.readbeyond.it/aeneas/docs/ >`__ provides
179
- an introduction to the concepts of
180
- ```task `` <http://www.readbeyond.it/aeneas/docs/#tasks>`__ and
181
- ```job `` <http://www.readbeyond.it/aeneas/docs/#job>`__, and it lists of
182
- all the options and tools available in the library.
193
+ The `documentation <http://www.readbeyond.it/aeneas/docs/ >`__ contains a
194
+ highly suggested
195
+ `tutorial <http://www.readbeyond.it/aeneas/docs/clitutorial.html >`__
196
+ which explains how to use the built-in command line tools.
183
197
184
198
Documentation and Support
185
199
-------------------------
186
200
187
- Documentation: http://www.readbeyond.it/aeneas/docs/
188
-
189
- High level description of how aeneas works:
190
- ` HOWITWORKS < https://github.com/readbeyond/aeneas/blob/master/wiki/HOWITWORKS.md >`__
191
-
192
- Tutorial : `A Practical Introduction To The aeneas
193
- Package <http://www.albertopettarin.it/blog/2015/05/21/a-practical-introduction-to-the-aeneas-package.html> `__
194
-
195
- Mailing list: https://groups.google.com/d/forum/aeneas-forced-alignment
196
-
197
- Changelog: http://www.readbeyond.it/ aeneas/docs/changelog.html
198
-
199
- Development history:
200
- `HISTORY <https://github.com/readbeyond/aeneas/blob/master/wiki/HISTORY.md >`__
201
+ - Documentation: http://www.readbeyond.it/aeneas/docs/
202
+ - Command line tools tutorial:
203
+ http://www.readbeyond.it/aeneas/docs/clitutorial.html
204
+ - Library tutorial:
205
+ http://www.readbeyond.it/aeneas/docs/libtutorial.html
206
+ - Old, verbose tutorial : `A Practical Introduction To The aeneas
207
+ Package <http://www.albertopettarin.it/blog/2015/05/21/a-practical-introduction-to-the-aeneas-package.html> `__
208
+ - Mailing list:
209
+ https://groups.google.com/d/forum/aeneas-forced-alignment
210
+ - Changelog: http://www.readbeyond.it/aeneas/docs/changelog.html
211
+ - High level description of how ** aeneas ** works:
212
+ ` HOWITWORKS < https://github.com/readbeyond/aeneas/blob/master/wiki/HOWITWORKS.md >`__
213
+ - Development history:
214
+ `HISTORY <https://github.com/readbeyond/aeneas/blob/master/wiki/HISTORY.md >`__
201
215
202
216
Supported Features
203
217
------------------
204
218
205
- - Input text files in plain, parsed, subtitles, or unparsed format
219
+ - Input text files in ``parsed ``, ``plain ``, ``subtitles ``, or
220
+ ``unparsed `` (XML) format
221
+ - Multilevel input text files in ``mplain `` and ``munparsed `` (XML)
222
+ format
206
223
- Text extraction from XML (e.g., XHTML) files using ``id `` and
207
224
``class `` attributes
208
225
- Arbitrary text fragment granularity (single word, subphrase, phrase,
209
226
paragraph, etc.)
210
- - Input audio file formats: all those supported by ``ffmpeg ``
211
- - Possibility of downloading the audio file from a YouTube video
212
- - Batch processing
213
- - Output sync map formats: CSV, JSON, RBSE, SMIL, SSV, TSV, TTML, TXT,
214
- VTT, XML
215
- - Tested languages: BG, CA, CY, CS, DA, DE, EL, EN, EO, ES, ET, FA, FI,
216
- FR, GA, GRC, HR, HU, IS, IT, LA, LT, LV, NL, NO, RO, RU, PL, PT, SK,
217
- SR, SV, SW, TR, UK
227
+ - Input audio file formats: all those readable by ``ffmpeg ``
228
+ - Output sync map formats: AUD, CSV, EAF, JSON, SMIL, SRT, SSV, SUB,
229
+ TSV, TTML, TXT, VTT, XML
230
+ - Tested languages: ARA, BUL, CAT, CYM, CES, DAN, DEU, ELL, ENG, EPO,
231
+ EST, FAS, FIN, FRA, GLE, GRC, HRV, HUN, ISL, ITA, LAT, LAV, LIT, NLD,
232
+ NOR, RON, RUS, POL, POR, SLK, SPA, SRP, SWA, SWE, TUR, UKR
233
+ - MFCC and DTW computed via Python C extensions to reduce the
234
+ processing time
235
+ - On Linux, eSpeak called via a Python C extension for faster audio
236
+ synthesis
237
+ - Batch processing of multiple audio/text pairs
238
+ - Several built-in TTS engine wrappers: eSpeak (default, FLOSS),
239
+ Festival (FLOSS), Nuance TTS API (commercial)
240
+ - Use custom TTS engine wrappers besides the built-in ones
241
+ - Download audio from a YouTube video
242
+ - In multilevel mode, recursive alignment from paragraph to sentence to
243
+ word level
218
244
- Robust against misspelled/mispronounced words, local rearrangements
219
245
of words, background noise/sporadic spikes
220
- - Code suitable for a Web app deployment (e.g., on-demand AWS
221
- instances)
222
246
- Adjustable splitting times, including a max character/second
223
247
constraint for CC applications
224
248
- Automated detection of audio head/tail
225
- - MFCC and DTW computed via Python C extensions to reduce the
226
- processing time
227
- - On Linux, ``espeak `` called via a Python C extension for faster audio
228
- synthesis
229
- - Output an HTML file (from ``finetuneas `` project) for fine tuning the
230
- sync map manually
249
+ - Output an HTML file for fine tuning the sync map manually
250
+ (``finetuneas `` project)
231
251
- Execution parameters tunable at runtime
252
+ - Code suitable for Web app deployment (e.g., on-demand cloud
253
+ computing)
232
254
233
255
Limitations and Missing Features
234
256
--------------------------------
@@ -238,8 +260,6 @@ Limitations and Missing Features
238
260
- Audio is assumed to be spoken: not suitable/YMMV for song captioning
239
261
- No protection against memory trashing if you feed extremely long
240
262
audio files
241
- - On Mac OS X and Windows, audio synthesis might be slow if you have
242
- thousands of text fragments
243
263
- `Open issues <https://github.com/readbeyond/aeneas/issues >`__
244
264
245
265
License
@@ -252,7 +272,7 @@ details.
252
272
253
273
Licenses for third party code and files included in **aeneas ** can be
254
274
found in the
255
- `licenses/ <https://github.com/readbeyond/aeneas/blob/master/licenses/README.md >`__
275
+ `licenses <https://github.com/readbeyond/aeneas/blob/master/licenses/README.md >`__
256
276
directory.
257
277
258
278
No copy rights were harmed in the making of this project.
@@ -278,6 +298,9 @@ Sponsors
278
298
- **October 2015 **: an anonymous donation sponsored the development of
279
299
the "YouTube downloader" option (v1.3.0)
280
300
301
+ - **April 2016 **: the Fruch Foundation kindly sponsored the development
302
+ and documentation of v1.5.0
303
+
281
304
Supporting
282
305
~~~~~~~~~~
283
306
@@ -337,6 +360,9 @@ asynchronous usage.
337
360
**Chris Hubbard ** prepared the files for packaging aeneas as a
338
361
Debian/Ubuntu ``.deb ``.
339
362
363
+ **Firat Ozdemir ** contributed the ``finetuneas `` HTML/JS code for fine
364
+ tuning sync maps in the browser.
365
+
340
366
All the mighty `GitHub
341
367
contributors <https://github.com/readbeyond/aeneas/graphs/contributors> `__,
342
368
and the members of the `Google
0 commit comments