update remote master by yichao-mt · Pull Request #4 · yichao-mt/lhotse

yichao-mt · 2025-02-09T11:27:27Z

No description provided.

* Fix MixedCut transforms serialization * fix

)

* augmentation/torchaudio: add Phone effect (mulaw, lpc10 codecs) * restore_orig_sr option --------- Co-authored-by: Piotr Żelasko <petezor@gmail.com>

* Add EARS recipe * Add download and fix cli for the EARS dataset * Fix formatting for EARS recipe

* Concurrent reads in dynamic bucketing for faster start time. * Don't exceed the buffer_size; eliminate some race conditions * Missing flag * use a proper queue for concurrency * disable concurrency by default * Add a test for the concurrent implementation

* Refactor bucket selection to allow customization * Extend the API further * Prune imports

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Include a copyright NOTICE * Include a copyright NOTICE

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

* add wenetspeech4tts recipe * fix wenetspeech4tts recipe * fix wenetspeech4tts recipe float * fix wenetspeech4tts recipe typo * fix wenetspeech4tts recipe typo * add wenetspeech4tts doc

* init commit * added dependencies for unit_tests * fixed compatibility for python 3.8 * fixed base_url * fixed metadata_url * Update spatial_librispeech.py * Update spatial_librispeech.py * minor fixes * multi-threaded 🪢 * Update spatial_librispeech.py * finalize the recipe * minor updates * fixed missing import cmd

#1387) * Fix to fixed batch size bucketing and audio loading network connection resets * Fix tests and add more 'paranoia' tests

[spgispeech] Fix durations are null issue

* fix ksponspeech.py * fix black

fix ksponspeech.py

…BCSAE) (#1395) * initial commit * transcript fixes * added SBCSAE download * Updates sbcsae to properly process mono_channel audio and adds speaker origin as geolocations for speakers * Fixes a few 0-width segments by adding 0.02 s of padding * small fix * Add alignment export option Exports aligned supervisions along with the original supervisions with or without changing the text after manual inspections and corrections. * update to cli flags and docs * added sbcsae to docs and fixed python compatibility * more python3.8 fixes --------- Co-authored-by: Matthew Wiesner <wiesner@jhu.edu> Co-authored-by: Dominik Klement <klement.dominik86@gmail.com> Co-authored-by: Piotr Żelasko <petezor@gmail.com>

* Implement conversion from CutSet to HuggingFace dataset So far, conversion from CutSet containing MonoCut and single-source audio to HuggingFace dataset. * Refactor * Add docs to set.py --------- Co-authored-by: Piotr Żelasko <petezor@gmail.com>

* Adds radio data recipe * Makes some small formatting changes * Fixing black and isort formatting * Fixes disable_ffmpeg_torchaudio_info to use contextmanager * Removes what appears to be an unnecessary set_ffmpeg_torchaudio_info_enabled call. The recipe runs fine without it.

* Adds fleurs recipe * Black formatting * Removes useless num_jobs argument in the download cli, and ran isort and black again on *recipes/fleurs.py * Removes what appears to be an unnecessary set_ffmpeg_torchaudio_info call * isort and black fix * Fixes remaining black issues due to trailing space in recipes/__init__.py * Adds FLEURS entry in docs/corpus.rst

* Add the Emilia corpus. * Return cutset instead * fix style issues

Co-authored-by: npovey <you@example.com>

* add workflow: dnsmos * add cli for dnsmos workflow * fix and test * fix --------- Co-authored-by: Your Name <you@example.com>

Remove the deprecated usage.

* Make torchaudio an optional dependency * Remove torchaudio from some CI tests

- Implements AISBatchLoader class to load all data referenced by a CutSet (recordings, features, arrays, images) in one Get-Batch API call. - Reduces network overhead by fetching all objects in bulk instead of individually. - Offloads archive extraction and object fetching to the AIStore cluster. - Updates manifests to point to in-memory data representations. - Add tutorial notebook for AISBatchLoader. Signed-off-by: Abhishek Gaikwad <gaikwadabhishek1997@gmail.com>

- Add comprehensive test suite with mocked AIStore client - Fix batch result consumption bug by tracking URL-enabled manifests - Add TemporalArray support with proper inner array handling Signed-off-by: Abhishek Gaikwad <gaikwadabhishek1997@gmail.com>

- Introduced environment variables for configuring AIStore batch loading: AIS_ENDPOINT and USE_AIS_GET_BATCH. - Implemented logic to enable batch loading from AIStore, improving efficiency by fetching audio data in bulk. This enhancement allows for more efficient audio data handling when using Lhotse with AIStore. Signed-off-by: Abhishek Gaikwad <gaikwadabhishek1997@gmail.com>

…form (#1527) avoid bug appearing with OnTheFlyFeatures with PerturbVolume on 4 GPUs and 6 workers per GPU ``` File "/mnt/matylda5/iveselyk/ASR_TOOLKITS/K2_SHERPA_PYTORCH24_CUDA121/K2_CONDA_ENVIRONMENT/lib/python3.11/multiprocessing/reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) TypeError: cannot pickle 'module' object ``` the `PerturbVolume.random` should not have the module `random` assigned: https://github.com/lhotse-speech/lhotse/blob/a509b4ad9e3c997c08b6f0d41086a14109d0ac81/lhotse/dataset/cut_transforms/perturb_volume.py#L31 assigning the object `random.Random()` was fine, the error disappeared.

Update README.md

* Fix CutSampler initialization for newer PyTorch versions * Update unit tests for newer python and pytorch versions * Unfreeze some test package versions * Remove torchscriptability checks for feature extractors

added support for notsofar ihm prep

…in AISBatchLoader (#1542) - Add backward compatibility for older AIStore SDK versions (Colocation, ArchiveConfig) - Move all aistore imports to method level (remove top-level imports) - Replace module-level logging with logger instance for better configuration - Fix return type annotation for _collect_manifest_urls (None -> bool) - Add safe attribute access with getattr() in error messages - Simplify ValueError handling with early return Tests: - Add version compatibility tests for Colocation fallback This improves SDK version compatibility and code robustness while maintaining full backward compatibility with older AIStore deployments. Signed-off-by: Abhishek Gaikwad <gaikwadabhishek1997@gmail.com>

- Rely on AIS_CONNECT_TIMEOUT and AIS_READ_TIMEOUT env vars for timeout config - Add link to AIStore SDK environment variables docs Signed-off-by: Abhishek Gaikwad <gaikwadabhishek1997@gmail.com>

* Add HuggingFace audio and GDrive pseudo-label downloads * Add tar extraction caching and lazy 16kHz resampling * Add data validation to drop 0-duration segments and word alignments * Register `oto_speech` commands in Lhotse CLI * Add `prepare_oto_speech.sh` script for end-to-end cutset generation

#1553)

* Fix test fixtures and backend gating * Make lilcom optional and default to numpy storage * Clean up stale xfail markers * Improve storage backend discoverability

…date docs (#1557) * Use open_best for AudioSource URLs * Add CLI for listing IO backends

* Add torchcodec support * fix ci torchcodec version * bump min torch version for torchcodec

…annel batches gracefully (#1563) * Add AudioSamples(mono_downmix=True) to handle mixed single/multi channel batches gracefully * Update defaults to be non-breaking for multi-channel audio

- catch StopIteration during batch result iteration and fall back to individual GET requests instead of crashing the DataLoader worker - use batch_stream_failed flag to skip dead iterator for remaining objects - reuse existing _get_object_from_moss_in() retry path for recovery - update test to verify fallback behavior instead of expecting crash Signed-off-by: Abhishek Gaikwad <gaikwadabhishek1997@gmail.com>

- filter out supervisions with duration <= 0 before building IntervalTree in index_supervisions(), preventing ValueError on null intervals - zero-duration supervisions can occur when cut_into_windows() produces a supervision that falls exactly on a window boundary - without this fix, the IntervalTree crash silently kills the Lhotse producer thread, starving the data pipeline and causing NCCL timeouts in distributed training Signed-off-by: Abhishek Gaikwad <gaikwadabhishek1997@gmail.com>

* Chunking functionality Signed-off-by: Nune <ntadevosyan@nvidia.com> * name change Signed-off-by: Nune <ntadevosyan@nvidia.com> * Works with batches Signed-off-by: Nune <ntadevosyan@nvidia.com> * Removed Grouping class, handled in NeMo Signed-off-by: Nune <ntadevosyan@nvidia.com> * Tests for overlapping cuts Signed-off-by: Nune <ntadevosyan@nvidia.com> * Tests updates Signed-off-by: Nune <ntadevosyan@nvidia.com> * isort changes Signed-off-by: Nune <ntadevosyan@nvidia.com> --------- Signed-off-by: Nune <ntadevosyan@nvidia.com>

…` bools, and `MixedCut.unmix(tag=...)` (#1559) Add tagged unmix compatibility and hidden SNR refs

…#1569)

pzelasko and others added 30 commits July 15, 2024 12:43

Fix MixedCut transforms serialization (#1370)

0a4aed4

* Fix MixedCut transforms serialization * fix

Support for pre-determined batch sizes in DynamicBucketingSampler (#1372

c286f28

)

augmentation/torchaudio: add Phone effect (mulaw, lpc10 codecs) (#1348)

18436e9

* augmentation/torchaudio: add Phone effect (mulaw, lpc10 codecs) * restore_orig_sr option --------- Co-authored-by: Piotr Żelasko <petezor@gmail.com>

bump dev version to 1.26.0

6a17721

Add EARS recipe (#1375)

fa8cbfe

* Add EARS recipe * Add download and fix cli for the EARS dataset * Fix formatting for EARS recipe

Refactor bucket selection for customization (#1377)

21b102c

* Refactor bucket selection to allow customization * Extend the API further * Prune imports

Bump dev version to 1.27.0

2b75622

Cap the 'trng' random seeds to 2**31 avoiding numpy error (#1379)

bcd1e22

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

CutSet.prefetch() for background cuts loading during iteration (#1380)

748cd50

Include a copyright NOTICE listing major copyright holders (#1381)

bf37599

* Include a copyright NOTICE * Include a copyright NOTICE

Added has_custom to MixedCut (#1383)

9bea2db

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

[Recipe] Wenetspeech4tts (#1384)

e78add5

* add wenetspeech4tts recipe * fix wenetspeech4tts recipe * fix wenetspeech4tts recipe float * fix wenetspeech4tts recipe typo * fix wenetspeech4tts recipe typo * add wenetspeech4tts doc

Fix to fixed batch size bucketing and audio loading network connectio… (

170046f

#1387) * Fix to fixed batch size bucketing and audio loading network connection resets * Fix tests and add more 'paranoia' tests

Bump dev version to 1.28.0

4ca97dc

[spgispeech] Fix durations object is null issue (#1390)

bc2c0a2

[spgispeech] Fix durations are null issue

Fix backend to None while ffmpeg is unavailable. (#1392)

a31a532

Fix ksponspeech recipe (#1394)

82b313f

* fix ksponspeech.py * fix black

Fix cli for ksponspeech (#1393)

c8ba6d0

fix ksponspeech.py

Add the Emilia corpus (#1404)

41269ff

* Add the Emilia corpus. * Return cutset instead * fix style issues

[fix] fisher_english recipe (#1410)

8b6d6f5

downgrading sphinx version from 7.2.6 to 7.1.2 (#1409)

aff1188

Co-authored-by: npovey <you@example.com>

Add workflow: annotate DNSMOS P.835 (#1406)

9648516

* add workflow: dnsmos * add cli for dnsmos workflow * fix and test * fix --------- Co-authored-by: Your Name <you@example.com>

Update lhotse.py (#1414)

3ab3917

Remove the deprecated usage.

Make torchaudio an optional dependency (#1382)

54bb42f

* Make torchaudio an optional dependency * Remove torchaudio from some CI tests

gaikwadabhishek and others added 30 commits October 30, 2025 15:44

Lhotse version 1.32.0

2094489

Fix Lhotse import on Windows

8500bde

Bump patch version to 1.32.1

19cebc9

NSF grant acknowledgment in README.md (#1539)

434e935

Update README.md

Fix CutSampler initialization for newer PyTorch versions (#1543)

e0a36fc

* Fix CutSampler initialization for newer PyTorch versions * Update unit tests for newer python and pytorch versions * Unfreeze some test package versions * Remove torchscriptability checks for feature extractors

Fix cuts conversion to hf datasets (#1546)

aed1263

Fix invalid escape sequence warnings in iwslt22_ta (#1540)

9d7630e

extend SimpleCutSampler to work better with CutConcatenate (#1520)

f2d2411

Notsofar ihm recipe (#1551)

5e563d9

added support for notsofar ihm prep

Remove hardcoded timeout from AIStore client (#1549)

393b908

- Rely on AIS_CONNECT_TIMEOUT and AIS_READ_TIMEOUT env vars for timeout config - Add link to AIStore SDK environment variables docs Signed-off-by: Abhishek Gaikwad <gaikwadabhishek1997@gmail.com>

Support loading multiple non-overlapping custom recordings in MixedCut (

6b32957

#1553)

Make lilcom optional and fix failing unit tests on MacOS (#1555)

26fa15f

* Fix test fixtures and backend gating * Make lilcom optional and default to numpy storage * Clean up stale xfail markers * Improve storage backend discoverability

Respect LHOTSE_IO_BACKEND for reading AudioSource(type='url'); up…

f5781fc

…date docs (#1557) * Use open_best for AudioSource URLs * Add CLI for listing IO backends

Fix cached manifest reading in some recipes (#1560)

b6173fc

Add torchcodec support (#1562)

d289860

* Add torchcodec support * fix ci torchcodec version * bump min torch version for torchcodec

Add AudioSamples(mono_downmix=True) to handle mixed single/multi ch…

c72136c

…annel batches gracefully (#1563) * Add AudioSamples(mono_downmix=True) to handle mixed single/multi channel batches gracefully * Update defaults to be non-breaking for multi-channel audio

Fix for AIStore client 1.23 (#1565)

8feed52

fix numpy broadcast dtype issue in loudness normalization (#1561)

e4da495

Add CutSet.mix(..., tag="noise"), `MixTrack.{is_snr_reference,mute}…

d0710cb

…` bools, and `MixedCut.unmix(tag=...)` (#1559) Add tagged unmix compatibility and hidden SNR refs

Bump version to 1.33.0 for release

d72ceb0

fix: skip AIS batch.get() on empty batch to silence spurious warnings (…

6b45efe

…#1569)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update remote master#4

update remote master#4
yichao-mt wants to merge 124 commits into
yichao-mt:masterfrom
lhotse-speech:master

yichao-mt commented Feb 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

yichao-mt commented Feb 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants