Skip to content

update remote master#4

Open
yichao-mt wants to merge 124 commits into
yichao-mt:masterfrom
lhotse-speech:master
Open

update remote master#4
yichao-mt wants to merge 124 commits into
yichao-mt:masterfrom
lhotse-speech:master

Conversation

@yichao-mt

Copy link
Copy Markdown
Owner

No description provided.

pzelasko and others added 30 commits July 15, 2024 12:43
* Fix MixedCut transforms serialization

* fix
* augmentation/torchaudio: add Phone effect (mulaw, lpc10 codecs)

* restore_orig_sr option

---------

Co-authored-by: Piotr Żelasko <petezor@gmail.com>
* Add EARS recipe

* Add download and fix cli for the EARS dataset

* Fix formatting for EARS recipe
* Concurrent reads in dynamic bucketing for faster start time.

* Don't exceed the buffer_size; eliminate some race conditions

* Missing flag

* use a proper queue for concurrency

* disable concurrency by default

* Add a test for the concurrent implementation
* Refactor bucket selection to allow customization

* Extend the API further

* Prune imports
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Include a copyright NOTICE

* Include a copyright NOTICE
Signed-off-by: Ante Jukić <ajukic@nvidia.com>
* add wenetspeech4tts recipe

* fix wenetspeech4tts recipe

* fix wenetspeech4tts recipe float

* fix wenetspeech4tts recipe typo

* fix wenetspeech4tts recipe typo

* add wenetspeech4tts doc
* init commit

* added dependencies for unit_tests

* fixed compatibility for python 3.8

* fixed base_url

* fixed metadata_url

* Update spatial_librispeech.py

* Update spatial_librispeech.py

* minor fixes

* multi-threaded 🪢

* Update spatial_librispeech.py

* finalize the recipe

* minor updates

* fixed missing import cmd
#1387)

* Fix to fixed batch size bucketing and audio loading network connection resets

* Fix tests and add more 'paranoia' tests
[spgispeech] Fix durations are null issue
* fix ksponspeech.py

* fix black
…BCSAE) (#1395)

* initial commit

* transcript fixes

* added SBCSAE download

* Updates sbcsae to properly process mono_channel audio and adds speaker origin as geolocations for speakers

* Fixes a few 0-width segments by adding 0.02 s of padding

* small fix

* Add alignment export option

Exports aligned supervisions along with the original supervisions with or without changing the text after manual inspections and corrections.

* update to cli flags and docs

* added sbcsae to docs and fixed python compatibility

* more python3.8 fixes

---------

Co-authored-by: Matthew Wiesner <wiesner@jhu.edu>
Co-authored-by: Dominik Klement <klement.dominik86@gmail.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
* Implement conversion from CutSet to HuggingFace dataset

So far, conversion from CutSet containing MonoCut and single-source audio to HuggingFace dataset.

* Refactor

* Add docs to set.py

---------

Co-authored-by: Piotr Żelasko <petezor@gmail.com>
* Adds radio data recipe

* Makes some small formatting changes

* Fixing black and isort formatting

* Fixes disable_ffmpeg_torchaudio_info to use contextmanager

* Removes what appears to be an unnecessary set_ffmpeg_torchaudio_info_enabled call. The recipe runs fine without it.
* Adds fleurs recipe

* Black formatting

* Removes useless num_jobs argument in the download cli, and ran isort and black again on *recipes/fleurs.py

* Removes what appears to be an unnecessary set_ffmpeg_torchaudio_info call

* isort and black fix

* Fixes remaining black issues due to trailing space in recipes/__init__.py

* Adds FLEURS entry in docs/corpus.rst
* Add the Emilia corpus.

* Return cutset instead

* fix style issues
Co-authored-by: npovey <you@example.com>
* add workflow: dnsmos

* add cli for dnsmos workflow

* fix and test

* fix

---------

Co-authored-by: Your Name <you@example.com>
Remove the deprecated usage.
* Make torchaudio an optional dependency

* Remove torchaudio from some CI tests
gaikwadabhishek and others added 30 commits October 30, 2025 15:44
- Implements AISBatchLoader class to load all data referenced by a CutSet
  (recordings, features, arrays, images) in one Get-Batch API call.
- Reduces network overhead by fetching all objects in bulk instead of individually.
- Offloads archive extraction and object fetching to the AIStore cluster.
- Updates manifests to point to in-memory data representations.
- Add tutorial notebook for AISBatchLoader.

Signed-off-by: Abhishek Gaikwad <gaikwadabhishek1997@gmail.com>
- Add comprehensive test suite with mocked AIStore client
- Fix batch result consumption bug by tracking URL-enabled manifests
- Add TemporalArray support with proper inner array handling

Signed-off-by: Abhishek Gaikwad <gaikwadabhishek1997@gmail.com>
- Introduced environment variables for configuring AIStore batch loading: AIS_ENDPOINT and USE_AIS_GET_BATCH.
- Implemented logic to enable batch loading from AIStore, improving efficiency by fetching audio data in bulk.

This enhancement allows for more efficient audio data handling when using Lhotse with AIStore.

Signed-off-by: Abhishek Gaikwad <gaikwadabhishek1997@gmail.com>
…form (#1527)

avoid bug appearing with OnTheFlyFeatures with PerturbVolume on 4 GPUs and 6 workers per GPU

```
File "/mnt/matylda5/iveselyk/ASR_TOOLKITS/K2_SHERPA_PYTORCH24_CUDA121/K2_CONDA_ENVIRONMENT/lib/python3.11/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle 'module' object
```

the `PerturbVolume.random` should not have the module `random` assigned:
https://github.com/lhotse-speech/lhotse/blob/a509b4ad9e3c997c08b6f0d41086a14109d0ac81/lhotse/dataset/cut_transforms/perturb_volume.py#L31

assigning the object `random.Random()` was fine, the error disappeared.
* Fix CutSampler initialization for newer PyTorch versions

* Update unit tests for newer python and pytorch versions

* Unfreeze some test package versions

* Remove torchscriptability checks for feature extractors
added support for notsofar ihm prep
…in AISBatchLoader (#1542)

- Add backward compatibility for older AIStore SDK versions (Colocation, ArchiveConfig)
- Move all aistore imports to method level (remove top-level imports)
- Replace module-level logging with logger instance for better configuration
- Fix return type annotation for _collect_manifest_urls (None -> bool)
- Add safe attribute access with getattr() in error messages
- Simplify ValueError handling with early return

Tests:
- Add version compatibility tests for Colocation fallback

This improves SDK version compatibility and code robustness while maintaining
full backward compatibility with older AIStore deployments.

Signed-off-by: Abhishek Gaikwad <gaikwadabhishek1997@gmail.com>
- Rely on AIS_CONNECT_TIMEOUT and AIS_READ_TIMEOUT env vars for timeout config
- Add link to AIStore SDK environment variables docs

Signed-off-by: Abhishek Gaikwad <gaikwadabhishek1997@gmail.com>
* Add HuggingFace audio and GDrive pseudo-label downloads
* Add tar extraction caching and lazy 16kHz resampling
* Add data validation to drop 0-duration segments and word alignments
* Register `oto_speech` commands in Lhotse CLI
* Add `prepare_oto_speech.sh` script for end-to-end cutset generation
* Fix test fixtures and backend gating

* Make lilcom optional and default to numpy storage

* Clean up stale xfail markers

* Improve storage backend discoverability
…date docs (#1557)

* Use open_best for AudioSource URLs

* Add CLI for listing IO backends
* Add torchcodec support

* fix ci torchcodec version

* bump min torch version for torchcodec
…annel batches gracefully (#1563)

* Add AudioSamples(mono_downmix=True) to handle mixed single/multi channel batches gracefully

* Update defaults to be non-breaking for multi-channel audio
- catch StopIteration during batch result iteration and fall back
  to individual GET requests instead of crashing the DataLoader worker
- use batch_stream_failed flag to skip dead iterator for remaining objects
- reuse existing _get_object_from_moss_in() retry path for recovery
- update test to verify fallback behavior instead of expecting crash

Signed-off-by: Abhishek Gaikwad <gaikwadabhishek1997@gmail.com>
- filter out supervisions with duration <= 0 before building IntervalTree
  in index_supervisions(), preventing ValueError on null intervals
- zero-duration supervisions can occur when cut_into_windows() produces
  a supervision that falls exactly on a window boundary
- without this fix, the IntervalTree crash silently kills the Lhotse
  producer thread, starving the data pipeline and causing NCCL timeouts
  in distributed training

Signed-off-by: Abhishek Gaikwad <gaikwadabhishek1997@gmail.com>
* Chunking functionality

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* name change

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Works with batches

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Removed Grouping class, handled in NeMo

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Tests for overlapping cuts

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Tests updates

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* isort changes

Signed-off-by: Nune <ntadevosyan@nvidia.com>

---------

Signed-off-by: Nune <ntadevosyan@nvidia.com>
…` bools, and `MixedCut.unmix(tag=...)` (#1559)

Add tagged unmix compatibility and hidden SNR refs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.