Skip to content

Commit dac70a2

Browse files
authored
docs: fix multiple typos in ASR bucketing documentation (#15376)
1 parent 4e73ff0 commit dac70a2

1 file changed

Lines changed: 5 additions & 5 deletions

File tree

docs/source/asr/datasets.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -263,7 +263,7 @@ Semi Sorted Batching
263263

264264
Sorting samples by duration and spliting them into batches speeds up training, but can degrade the quality of the model. To avoid quality degradation and maintain some randomness in the partitioning process, we add pseudo noise to the sample length when sorting.
265265

266-
It may result into training speeedup of more than 40 percent with the same quality. To enable and use semi sorted batching add some lines in config.
266+
It may result into training speedup of more than 40 percent with the same quality. To enable and use semi sorted batching add some lines in config.
267267

268268
.. code::
269269
@@ -288,8 +288,8 @@ For more details about this algorithm, see the `paper <https://www.isca-archive.
288288
Bucketing Datasets
289289
---------------------
290290

291-
Splitting the training samples into buckets with different lengths and sampling from the same bucket for each batch would increase the computation efficicncy.
292-
It may result into training speeedup of more than 2X. To enable and use the bucketing feature, you need to create the bucketing version of the dataset by using `conversion script here <https://github.com/NVIDIA/NeMo/tree/stable/scripts/speech_recognition/convert_to_tarred_audio_dataset.py>`_.
291+
Splitting the training samples into buckets with different lengths and sampling from the same bucket for each batch would increase the computation efficiency.
292+
It may result into training speedup of more than 2X. To enable and use the bucketing feature, you need to create the bucketing version of the dataset by using `conversion script here <https://github.com/NVIDIA/NeMo/tree/stable/scripts/speech_recognition/convert_to_tarred_audio_dataset.py>`_.
293293
You may use --buckets_num to specify the number of buckets (Recommend to use 4 to 8 buckets). It creates multiple tarred datasets, one per bucket, based on the audio durations. The range of [min_duration, max_duration) is split into equal sized buckets.
294294

295295
To enable the bucketing feature in the dataset section of the config files, you need to pass the multiple tarred datasets as a list of lists.
@@ -323,7 +323,7 @@ When bucketing_batch_size is not set, train_ds.batch_size is going to be used fo
323323

324324
bucketing_batch_size can be set as an integer or a list of integers to explicitly specify the batch size for each bucket.
325325
if bucketing_batch_size is set to be an integer, then linear scaling is being used to scale-up the batch sizes for batches with shorted audio size. For example, setting train_ds.bucketing_batch_size=8 for 4 buckets would use these sizes [32,24,16,8] for different buckets.
326-
When bucketing_batch_size is set, traind_ds.batch_size need to be set to 1.
326+
When bucketing_batch_size is set, train_ds.batch_size need to be set to 1.
327327

328328
Training an ASR model on audios sorted based on length may affect the accuracy of the model. We introduced some strategies to mitigate it.
329329
We support three types of bucketing strategies:
@@ -332,7 +332,7 @@ We support three types of bucketing strategies:
332332
* synced_randomized (default): each epoch would have a different order of buckets. Order of the buckets is shuffled every epoch.
333333
* fully_randomized: similar to synced_randomized but each GPU has its own random order. So GPUs would not be synced.
334334

335-
Tha parameter train_ds.bucketing_strategy can be set to specify one of these strategies. The recommended strategy is synced_randomized which gives the highest training speedup.
335+
The parameter train_ds.bucketing_strategy can be set to specify one of these strategies. The recommended strategy is synced_randomized which gives the highest training speedup.
336336
The fully_randomized strategy would have lower speedup than synced_randomized but may give better accuracy.
337337

338338
Bucketing may improve the training speed more than 2x but may affect the final accuracy of the model slightly. Training for more epochs and using 'synced_randomized' strategy help to fill this gap.

0 commit comments

Comments
 (0)