Modifications to handle force alignment, EOS placement, and OTF early interruption by ankitapasad · Pull Request #4 · kevinhu-nv/NeMo2

ankitapasad · 2026-02-05T00:20:42Z

What does this PR do ?

This PR adds the following features:

Turn off force alignment during validation, so as to (i) avoid wastage of resources and (ii) force alignment during validation often led to NCCL timeout error.
Agent EOS placement:

Replaces agent_supervision.end with a fixed offset relative to the user's speech. Specifically, the Agent's EOS is placed at user_turn_supervision.start + eos_offset_frames;
This decouples agent EOS from agent duration modeling. The EOS now serves as a dedicated barge-in trigger that signals exactly when the agent must yield to the user.
This resolves issues with our current training data mix where agent_supervision.end is inconsistent or prematurely timestamped before the user begins speaking. By anchoring EOS placement to the user's start, we provide the model with a stable, causal conditioning signal.
Note: eos_offset_frames is hard coded to 8 in build_token_channel arguments but it can be made a parameter passed from config.
Controlled by data.fix_eos_placements=True, set to True by default.

Add on-the-fly early interruption, built on top of d369b07

Early interruption is defined as user interrupting the agent during <text> token generation phase, and not the <pad> token generation phase;
Controlled by data.early_interruption_prob (default: 0.0, i.e. turned off) and data.early_interruption_overlap_tokens (default: 8, i.e., 640ms consistent to the EOS placement behavior);
For randomly chosen conversations (based on data.early_interruption_prob), an agent turn is selected at random to be early-interrupted. The user channel and agent channel are appropriately advanced and the conversation duration is truncated.
Fraction of samples transformed by early interruption is logged as early_interruption_successful_ratio in wandb;
Turn off early interruption for validation data;
An option to turn off early interruption for specific datasets is provided. For example, datasets that already have user interruptions are not accurately handled by this on-the-fly logic. This can be done by passing tags.otf_interruption=false in the data yaml.
Debugging-friendly features that save audio and audacity-format bos and eos labels. Only when self.model_cfg.get("debug", False) == True. Can be removed later.

Text number normalization from Edresson@468e30f

Controlled by model.use_numbers_norm, default True.

Updated noise augmentation from acf5b3c

Collection: speechlm2

Signed-off-by: Ankita Pasad <apasad@nvidia.com>

…lled by data.fix_eos_placements=True, True by default. Signed-off-by: Ankita Pasad <apasad@nvidia.com>

Signed-off-by: Ankita Pasad <apasad@nvidia.com>

github-actions · 2026-02-05T03:50:52Z

beep boop 🤖: 🚨 The following files must be fixed before merge!

Your code was analyzed with PyLint. The following annotations have been identified:

************* Module nemo.collections.speechlm2.parts.augmentation
nemo/collections/speechlm2/parts/augmentation.py:24:0: E0401: Unable to import 'librosa' (import-error)
nemo/collections/speechlm2/parts/augmentation.py:25:0: E0401: Unable to import 'numpy' (import-error)
nemo/collections/speechlm2/parts/augmentation.py:26:0: E0401: Unable to import 'soundfile' (import-error)
nemo/collections/speechlm2/parts/augmentation.py:27:0: E0401: Unable to import 'torch' (import-error)
nemo/collections/speechlm2/parts/augmentation.py:28:0: E0401: Unable to import 'scipy.signal' (import-error)
nemo/collections/speechlm2/parts/augmentation.py:48:4: R0913: Too many arguments (10/5) (too-many-arguments)
nemo/collections/speechlm2/parts/augmentation.py:48:4: R0917: Too many positional arguments (10/5) (too-many-positional-arguments)
nemo/collections/speechlm2/parts/augmentation.py:48:4: R0914: Too many local variables (23/15) (too-many-locals)
nemo/collections/speechlm2/parts/augmentation.py:65:26: R1721: Unnecessary use of a comprehension, use list(glob.glob(os.path.join(noise_folder, '*.wav'))) instead. (unnecessary-comprehension)
nemo/collections/speechlm2/parts/augmentation.py:104:33: W0640: Cell variable get_scale_factor defined in loop (cell-var-from-loop)
nemo/collections/speechlm2/parts/augmentation.py:104:62: W0640: Cell variable i defined in loop (cell-var-from-loop)
nemo/collections/speechlm2/parts/augmentation.py:142:4: R0914: Too many local variables (20/15) (too-many-locals)
nemo/collections/speechlm2/parts/augmentation.py:153:27: R1721: Unnecessary use of a comprehension, use list(glob.glob(os.path.join(roomir_folder, '*.wav'))) instead. (unnecessary-comprehension)
nemo/collections/speechlm2/parts/augmentation.py:181:23: W0718: Catching too general exception Exception (broad-exception-caught)
nemo/collections/speechlm2/parts/augmentation.py:208:23: W0718: Catching too general exception Exception (broad-exception-caught)
nemo/collections/speechlm2/parts/augmentation.py:142:4: R0912: Too many branches (15/12) (too-many-branches)
nemo/collections/speechlm2/parts/augmentation.py:226:4: R0914: Too many local variables (20/15) (too-many-locals)
nemo/collections/speechlm2/parts/augmentation.py:237:26: R1721: Unnecessary use of a comprehension, use list(glob.glob(os.path.join(micir_folder, '*.wav'))) instead. (unnecessary-comprehension)
nemo/collections/speechlm2/parts/augmentation.py:265:23: W0718: Catching too general exception Exception (broad-exception-caught)
nemo/collections/speechlm2/parts/augmentation.py:292:23: W0718: Catching too general exception Exception (broad-exception-caught)
nemo/collections/speechlm2/parts/augmentation.py:226:4: R0912: Too many branches (15/12) (too-many-branches)
nemo/collections/speechlm2/parts/augmentation.py:335:19: W0718: Catching too general exception Exception (broad-exception-caught)
nemo/collections/speechlm2/parts/augmentation.py:322:12: W0612: Unused variable 'codec_name' (unused-variable)
nemo/collections/speechlm2/parts/augmentation.py:22:0: W0611: Unused Tuple imported from typing (unused-import)
************* Module s2s_duplex_stt_infer
examples/speechlm2/s2s_duplex_stt_infer.py:1:0: C0114: Missing module docstring (missing-module-docstring)
examples/speechlm2/s2s_duplex_stt_infer.py:16:0: E0401: Unable to import 'torch' (import-error)
examples/speechlm2/s2s_duplex_stt_infer.py:17:0: E0401: Unable to import 'lightning.pytorch' (import-error)
examples/speechlm2/s2s_duplex_stt_infer.py:18:0: E0401: Unable to import 'omegaconf' (import-error)
examples/speechlm2/s2s_duplex_stt_infer.py:29:0: C0116: Missing function or method docstring (missing-function-docstring)
examples/speechlm2/s2s_duplex_stt_infer.py:60:4: E1120: No value for argument 'cfg' in function call (no-value-for-parameter)
************* Module s2s_duplex_stt_train
examples/speechlm2/s2s_duplex_stt_train.py:1:0: C0114: Missing module docstring (missing-module-docstring)
examples/speechlm2/s2s_duplex_stt_train.py:17:0: E0401: Unable to import 'torch' (import-error)
examples/speechlm2/s2s_duplex_stt_train.py:18:0: E0401: Unable to import 'lightning.pytorch' (import-error)
examples/speechlm2/s2s_duplex_stt_train.py:19:0: E0401: Unable to import 'lightning.pytorch.callbacks' (import-error)
examples/speechlm2/s2s_duplex_stt_train.py:20:0: E0401: Unable to import 'omegaconf' (import-error)
examples/speechlm2/s2s_duplex_stt_train.py:38:0: C0116: Missing function or method docstring (missing-function-docstring)
examples/speechlm2/s2s_duplex_stt_train.py:85:4: E1120: No value for argument 'cfg' in function call (no-value-for-parameter)
************* Module nemo.collections.speechlm2.data.datamodule
nemo/collections/speechlm2/data/datamodule.py:1:0: C0114: Missing module docstring (missing-module-docstring)
nemo/collections/speechlm2/data/datamodule.py:14:0: E0401: Unable to import 'torch' (import-error)
nemo/collections/speechlm2/data/datamodule.py:15:0: E0401: Unable to import 'lightning' (import-error)
nemo/collections/speechlm2/data/datamodule.py:16:0: E0401: Unable to import 'lightning.pytorch.utilities' (import-error)
nemo/collections/speechlm2/data/datamodule.py:17:0: E0401: Unable to import 'omegaconf' (import-error)
nemo/collections/speechlm2/data/datamodule.py:79:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/speechlm2/data/datamodule.py:90:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/speechlm2/data/datamodule.py:96:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/speechlm2/data/datamodule.py:102:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/speechlm2/data/datamodule.py:157:12: R1705: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it (no-else-return)
nemo/collections/speechlm2/data/datamodule.py:170:12: R1705: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it (no-else-return)
************* Module nemo.collections.speechlm2.data.s2s_dataset
nemo/collections/speechlm2/data/s2s_dataset.py:576:0: C0301: Line too long (121/119) (line-too-long)
nemo/collections/speechlm2/data/s2s_dataset.py:580:0: C0301: Line too long (136/119) (line-too-long)
nemo/collections/speechlm2/data/s2s_dataset.py:640:0: C0301: Line too long (132/119) (line-too-long)
nemo/collections/speechlm2/data/s2s_dataset.py:951:0: C0301: Line too long (137/119) (line-too-long)
nemo/collections/speechlm2/data/s2s_dataset.py:1071:0: C0301: Line too long (182/119) (line-too-long)
nemo/collections/speechlm2/data/s2s_dataset.py:1081:0: C0301: Line too long (167/119) (line-too-long)
nemo/collections/speechlm2/data/s2s_dataset.py:1114:0: C0301: Line too long (151/119) (line-too-long)
nemo/collections/speechlm2/data/s2s_dataset.py:1135:0: C0301: Line too long (137/119) (line-too-long)
nemo/collections/speechlm2/data/s2s_dataset.py:1:0: C0302: Too many lines in module (1285/1000) (too-many-lines)
nemo/collections/speechlm2/data/s2s_dataset.py:1:0: C0114: Missing module docstring (missing-module-docstring)
nemo/collections/speechlm2/data/s2s_dataset.py:17:0: E0401: Unable to import 'inflect' (import-error)
nemo/collections/speechlm2/data/s2s_dataset.py:18:0: E0401: Unable to import 'torch' (import-error)
nemo/collections/speechlm2/data/s2s_dataset.py:19:0: E0401: Unable to import 'torch.utils.data' (import-error)
nemo/collections/speechlm2/data/s2s_dataset.py:20:0: E0401: Unable to import 'torchaudio' (import-error)
nemo/collections/speechlm2/data/s2s_dataset.py:21:0: E0401: Unable to import 'lhotse' (import-error)
nemo/collections/speechlm2/data/s2s_dataset.py:22:0: E0401: Unable to import 'lhotse.cut' (import-error)
nemo/collections/speechlm2/data/s2s_dataset.py:23:0: E0401: Unable to import 'lhotse.dataset.collation' (import-error)
nemo/collections/speechlm2/data/s2s_dataset.py:24:0: E0401: Unable to import 'lhotse.utils' (import-error)
nemo/collections/speechlm2/data/s2s_dataset.py:96:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/speechlm2/data/s2s_dataset.py:106:11: W0718: Catching too general exception Exception (broad-exception-caught)
nemo/collections/speechlm2/data/s2s_dataset.py:110:0: R0902: Too many instance attributes (24/7) (too-many-instance-attributes)
nemo/collections/speechlm2/data/s2s_dataset.py:202:4: R0913: Too many arguments (13/5) (too-many-arguments)
nemo/collections/speechlm2/data/s2s_dataset.py:202:4: R0917: Too many positional arguments (13/5) (too-many-positional-arguments)
nemo/collections/speechlm2/data/s2s_dataset.py:270:8: W0612: Unused variable 'device' (unused-variable)
nemo/collections/speechlm2/data/s2s_dataset.py:296:4: R0913: Too many arguments (9/5) (too-many-arguments)
nemo/collections/speechlm2/data/s2s_dataset.py:296:4: R0917: Too many positional arguments (9/5) (too-many-positional-arguments)
nemo/collections/speechlm2/data/s2s_dataset.py:296:4: R0914: Too many local variables (18/15) (too-many-locals)
nemo/collections/speechlm2/data/s2s_dataset.py:304:8: C0415: Import outside toplevel (os) (import-outside-toplevel)
nemo/collections/speechlm2/data/s2s_dataset.py:311:13: W1514: Using open without explicitly specifying an encoding (unspecified-encoding)
nemo/collections/speechlm2/data/s2s_dataset.py:331:4: R0913: Too many arguments (11/5) (too-many-arguments)
nemo/collections/speechlm2/data/s2s_dataset.py:331:4: R0917: Too many positional arguments (11/5) (too-many-positional-arguments)
nemo/collections/speechlm2/data/s2s_dataset.py:331:4: R0914: Too many local variables (60/15) (too-many-locals)
nemo/collections/speechlm2/data/s2s_dataset.py:364:12: W0621: Redefining name 'torchaudio' from outer scope (line 20) (redefined-outer-name)
nemo/collections/speechlm2/data/s2s_dataset.py:362:12: C0415: Import outside toplevel (os) (import-outside-toplevel)
nemo/collections/speechlm2/data/s2s_dataset.py:364:12: W0404: Reimport 'torchaudio' (imported line 20) (reimported)
nemo/collections/speechlm2/data/s2s_dataset.py:364:12: C0415: Import outside toplevel (torchaudio) (import-outside-toplevel)
nemo/collections/speechlm2/data/s2s_dataset.py:364:12: E0401: Unable to import 'torchaudio' (import-error)
nemo/collections/speechlm2/data/s2s_dataset.py:331:4: R0912: Too many branches (20/12) (too-many-branches)
nemo/collections/speechlm2/data/s2s_dataset.py:331:4: R0915: Too many statements (98/50) (too-many-statements)
nemo/collections/speechlm2/data/s2s_dataset.py:560:4: R0914: Too many local variables (40/15) (too-many-locals)
nemo/collections/speechlm2/data/s2s_dataset.py:696:19: W0718: Catching too general exception Exception (broad-exception-caught)
nemo/collections/speechlm2/data/s2s_dataset.py:585:8: R1702: Too many nested blocks (6/5) (too-many-nested-blocks)
nemo/collections/speechlm2/data/s2s_dataset.py:560:4: R0912: Too many branches (28/12) (too-many-branches)
nemo/collections/speechlm2/data/s2s_dataset.py:560:4: R0915: Too many statements (84/50) (too-many-statements)
nemo/collections/speechlm2/data/s2s_dataset.py:696:12: W0612: Unused variable 'e' (unused-variable)
nemo/collections/speechlm2/data/s2s_dataset.py:791:4: R0914: Too many local variables (34/15) (too-many-locals)
nemo/collections/speechlm2/data/s2s_dataset.py:793:8: C0415: Import outside toplevel (io.BytesIO) (import-outside-toplevel)
nemo/collections/speechlm2/data/s2s_dataset.py:795:8: C0415: Import outside toplevel (numpy) (import-outside-toplevel)
nemo/collections/speechlm2/data/s2s_dataset.py:795:8: E0401: Unable to import 'numpy' (import-error)
nemo/collections/speechlm2/data/s2s_dataset.py:796:8: C0415: Import outside toplevel (soundfile) (import-outside-toplevel)
nemo/collections/speechlm2/data/s2s_dataset.py:796:8: E0401: Unable to import 'soundfile' (import-error)
nemo/collections/speechlm2/data/s2s_dataset.py:797:8: E0401: Unable to import 'lhotse' (import-error)
nemo/collections/speechlm2/data/s2s_dataset.py:797:8: C0415: Import outside toplevel (lhotse.AudioSource) (import-outside-toplevel)
nemo/collections/speechlm2/data/s2s_dataset.py:836:15: R1714: Consider merging these comparisons with 'in' by using 'i not in (first_agent_idx, last_user_idx)'. Use a set instead if elements are hashable. (consider-using-in)
nemo/collections/speechlm2/data/s2s_dataset.py:791:4: R0912: Too many branches (16/12) (too-many-branches)
nemo/collections/speechlm2/data/s2s_dataset.py:791:4: R0915: Too many statements (60/50) (too-many-statements)
nemo/collections/speechlm2/data/s2s_dataset.py:844:8: W0612: Unused variable 'new_duration' (unused-variable)
nemo/collections/speechlm2/data/s2s_dataset.py:110:0: R0903: Too few public methods (1/2) (too-few-public-methods)
nemo/collections/speechlm2/data/s2s_dataset.py:937:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/speechlm2/data/s2s_dataset.py:970:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/speechlm2/data/s2s_dataset.py:970:0: R0913: Too many arguments (15/5) (too-many-arguments)
nemo/collections/speechlm2/data/s2s_dataset.py:970:0: R0917: Too many positional arguments (15/5) (too-many-positional-arguments)
nemo/collections/speechlm2/data/s2s_dataset.py:970:0: R0914: Too many local variables (18/15) (too-many-locals)
nemo/collections/speechlm2/data/s2s_dataset.py:1044:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/speechlm2/data/s2s_dataset.py:1044:0: R0913: Too many arguments (15/5) (too-many-arguments)
nemo/collections/speechlm2/data/s2s_dataset.py:1044:0: R0917: Too many positional arguments (15/5) (too-many-positional-arguments)
nemo/collections/speechlm2/data/s2s_dataset.py:1044:0: R0914: Too many local variables (29/15) (too-many-locals)
nemo/collections/speechlm2/data/s2s_dataset.py:1103:15: R1716: Simplify chained comparison between the operands (chained-comparison)
nemo/collections/speechlm2/data/s2s_dataset.py:1044:0: R0912: Too many branches (16/12) (too-many-branches)
nemo/collections/speechlm2/data/s2s_dataset.py:1147:15: C0103: Argument name "_TIMESTAMP_PATTERN" doesn't conform to snake_case naming style (invalid-name)
nemo/collections/speechlm2/data/s2s_dataset.py:1147:60: C0103: Argument name "_SPACE_PATTERN" doesn't conform to snake_case naming style (invalid-name)
nemo/collections/speechlm2/data/s2s_dataset.py:1163:4: C0103: Argument name "_TIMESTAMP_PATTERN_STR" doesn't conform to snake_case naming style (invalid-name)
nemo/collections/speechlm2/data/s2s_dataset.py:1173:8: C0103: Variable name "_TIMESTAMP_PATTERN" doesn't conform to snake_case naming style (invalid-name)
nemo/collections/speechlm2/data/s2s_dataset.py:1184:4: C0103: Argument name "_TIMESTAMP_PATTERN_STR" doesn't conform to snake_case naming style (invalid-name)
nemo/collections/speechlm2/data/s2s_dataset.py:1205:66: C0103: Argument name "_TIMESTAMP_PATTERN_STR" doesn't conform to snake_case naming style (invalid-name)
nemo/collections/speechlm2/data/s2s_dataset.py:1224:0: R0913: Too many arguments (8/5) (too-many-arguments)
nemo/collections/speechlm2/data/s2s_dataset.py:1224:0: R0917: Too many positional arguments (8/5) (too-many-positional-arguments)
nemo/collections/speechlm2/data/s2s_dataset.py:1224:0: R0914: Too many local variables (19/15) (too-many-locals)
************* Module nemo.collections.speechlm2.models.duplex_stt_model
nemo/collections/speechlm2/models/duplex_stt_model.py:1:0: C0302: Too many lines in module (1404/1000) (too-many-lines)
nemo/collections/speechlm2/models/duplex_stt_model.py:1:0: C0114: Missing module docstring (missing-module-docstring)
nemo/collections/speechlm2/models/duplex_stt_model.py:19:0: E0401: Unable to import 'torch' (import-error)
nemo/collections/speechlm2/models/duplex_stt_model.py:20:0: E0401: Unable to import 'torch.distributed' (import-error)
nemo/collections/speechlm2/models/duplex_stt_model.py:21:0: E0401: Unable to import 'torch.nn.functional' (import-error)
nemo/collections/speechlm2/models/duplex_stt_model.py:22:0: E0401: Unable to import 'torchaudio' (import-error)
nemo/collections/speechlm2/models/duplex_stt_model.py:23:0: E0401: Unable to import 'lightning' (import-error)
nemo/collections/speechlm2/models/duplex_stt_model.py:24:0: E0401: Unable to import 'omegaconf' (import-error)
nemo/collections/speechlm2/models/duplex_stt_model.py:25:0: E0401: Unable to import 'peft' (import-error)
nemo/collections/speechlm2/models/duplex_stt_model.py:26:0: E0401: Unable to import 'torch' (import-error)
nemo/collections/speechlm2/models/duplex_stt_model.py:27:0: E0401: Unable to import 'torch.distributed.fsdp' (import-error)
nemo/collections/speechlm2/models/duplex_stt_model.py:28:0: E0401: Unable to import 'torch.distributed.tensor' (import-error)
nemo/collections/speechlm2/models/duplex_stt_model.py:29:0: E0401: Unable to import 'torch.distributed.tensor.parallel' (import-error)
nemo/collections/speechlm2/models/duplex_stt_model.py:37:0: E0401: Unable to import 'transformers' (import-error)
nemo/collections/speechlm2/models/duplex_stt_model.py:63:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/speechlm2/models/duplex_stt_model.py:63:0: R0902: Too many instance attributes (27/7) (too-many-instance-attributes)
nemo/collections/speechlm2/models/duplex_stt_model.py:142:16: C0415: Import outside toplevel (gc) (import-outside-toplevel)
nemo/collections/speechlm2/models/duplex_stt_model.py:144:16: E0401: Unable to import 'safetensors' (import-error)
nemo/collections/speechlm2/models/duplex_stt_model.py:144:16: C0415: Import outside toplevel (safetensors.safe_open) (import-outside-toplevel)
nemo/collections/speechlm2/models/duplex_stt_model.py:64:4: R0912: Too many branches (15/12) (too-many-branches)
nemo/collections/speechlm2/models/duplex_stt_model.py:64:4: R0915: Too many statements (81/50) (too-many-statements)
nemo/collections/speechlm2/models/duplex_stt_model.py:195:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/speechlm2/models/duplex_stt_model.py:199:20: W0212: Access to a protected member _unpack_nemo_file of a client class (protected-access)
nemo/collections/speechlm2/models/duplex_stt_model.py:216:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/speechlm2/models/duplex_stt_model.py:220:20: W0212: Access to a protected member _unpack_nemo_file of a client class (protected-access)
nemo/collections/speechlm2/models/duplex_stt_model.py:240:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/speechlm2/models/duplex_stt_model.py:244:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/speechlm2/models/duplex_stt_model.py:265:4: R0913: Too many arguments (8/5) (too-many-arguments)
nemo/collections/speechlm2/models/duplex_stt_model.py:265:4: R0917: Too many positional arguments (8/5) (too-many-positional-arguments)
nemo/collections/speechlm2/models/duplex_stt_model.py:265:4: R0914: Too many local variables (17/15) (too-many-locals)
nemo/collections/speechlm2/models/duplex_stt_model.py:294:8: C0103: Variable name "B" doesn't conform to snake_case naming style (invalid-name)
nemo/collections/speechlm2/models/duplex_stt_model.py:294:11: C0103: Variable name "T" doesn't conform to snake_case naming style (invalid-name)
nemo/collections/speechlm2/models/duplex_stt_model.py:269:8: W0613: Unused argument 'input_audio_tokens' (unused-argument)
nemo/collections/speechlm2/models/duplex_stt_model.py:270:8: W0613: Unused argument 'seq_mask' (unused-argument)
nemo/collections/speechlm2/models/duplex_stt_model.py:271:8: W0613: Unused argument 'target_text_tokens' (unused-argument)
nemo/collections/speechlm2/models/duplex_stt_model.py:272:8: W0613: Unused argument 'modality_adapter_emb' (unused-argument)
nemo/collections/speechlm2/models/duplex_stt_model.py:273:8: W0613: Unused argument 'speaker_encoder_emb' (unused-argument)
nemo/collections/speechlm2/models/duplex_stt_model.py:294:8: W0612: Unused variable 'B' (unused-variable)
nemo/collections/speechlm2/models/duplex_stt_model.py:294:11: W0612: Unused variable 'T' (unused-variable)
nemo/collections/speechlm2/models/duplex_stt_model.py:325:15: R1714: Consider merging these comparisons with 'in' by using 'formatter not in ('s2s_duplex_overlap_as_s2s_duplex', 'nemo_tarred_to_duplex')'. Use a set instead if elements are hashable. (consider-using-in)
nemo/collections/speechlm2/models/duplex_stt_model.py:341:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/speechlm2/models/duplex_stt_model.py:341:4: R0914: Too many local variables (49/15) (too-many-locals)
nemo/collections/speechlm2/models/duplex_stt_model.py:433:12: C0103: Variable name "B" doesn't conform to snake_case naming style (invalid-name)
nemo/collections/speechlm2/models/duplex_stt_model.py:433:31: C0103: Variable name "H" doesn't conform to snake_case naming style (invalid-name)
nemo/collections/speechlm2/models/duplex_stt_model.py:434:12: C0103: Variable name "T_src" doesn't conform to snake_case naming style (invalid-name)
nemo/collections/speechlm2/models/duplex_stt_model.py:435:12: C0103: Variable name "T_tgt" doesn't conform to snake_case naming style (invalid-name)
nemo/collections/speechlm2/models/duplex_stt_model.py:447:16: C0103: Variable name "T_src_tok" doesn't conform to snake_case naming style (invalid-name)
nemo/collections/speechlm2/models/duplex_stt_model.py:341:4: R0912: Too many branches (24/12) (too-many-branches)
nemo/collections/speechlm2/models/duplex_stt_model.py:341:4: R0915: Too many statements (92/50) (too-many-statements)
nemo/collections/speechlm2/models/duplex_stt_model.py:575:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/speechlm2/models/duplex_stt_model.py:575:4: R0914: Too many local variables (25/15) (too-many-locals)
nemo/collections/speechlm2/models/duplex_stt_model.py:637:16: C0103: Variable name "B" doesn't conform to snake_case naming style (invalid-name)
nemo/collections/speechlm2/models/duplex_stt_model.py:637:19: C0103: Variable name "T" doesn't conform to snake_case naming style (invalid-name)
nemo/collections/speechlm2/models/duplex_stt_model.py:575:4: R0912: Too many branches (13/12) (too-many-branches)
nemo/collections/speechlm2/models/duplex_stt_model.py:575:41: W0613: Unused argument 'batch_idx' (unused-argument)
nemo/collections/speechlm2/models/duplex_stt_model.py:695:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/speechlm2/models/duplex_stt_model.py:698:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/speechlm2/models/duplex_stt_model.py:714:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/speechlm2/models/duplex_stt_model.py:749:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/speechlm2/models/duplex_stt_model.py:794:16: C0415: Import outside toplevel (re) (import-outside-toplevel)
nemo/collections/speechlm2/models/duplex_stt_model.py:749:43: W0613: Unused argument 'batch_idx' (unused-argument)
nemo/collections/speechlm2/models/duplex_stt_model.py:804:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/speechlm2/models/duplex_stt_model.py:807:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/speechlm2/models/duplex_stt_model.py:810:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/speechlm2/models/duplex_stt_model.py:813:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/speechlm2/models/duplex_stt_model.py:816:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/speechlm2/models/duplex_stt_model.py:816:40: W0613: Unused argument 'batch_idx' (unused-argument)
nemo/collections/speechlm2/models/duplex_stt_model.py:816:56: W0613: Unused argument 'dataloader_idx' (unused-argument)
nemo/collections/speechlm2/models/duplex_stt_model.py:897:16: W0640: Cell variable batch_turns defined in loop (cell-var-from-loop)
nemo/collections/speechlm2/models/duplex_stt_model.py:944:4: R0913: Too many arguments (7/5) (too-many-arguments)
nemo/collections/speechlm2/models/duplex_stt_model.py:944:4: R0917: Too many positional arguments (7/5) (too-many-positional-arguments)
nemo/collections/speechlm2/models/duplex_stt_model.py:944:4: R0914: Too many local variables (34/15) (too-many-locals)
nemo/collections/speechlm2/models/duplex_stt_model.py:974:8: C0103: Variable name "B" doesn't conform to snake_case naming style (invalid-name)
nemo/collections/speechlm2/models/duplex_stt_model.py:974:11: C0103: Variable name "T_local" doesn't conform to snake_case naming style (invalid-name)
nemo/collections/speechlm2/models/duplex_stt_model.py:974:20: C0103: Variable name "H" doesn't conform to snake_case naming style (invalid-name)
nemo/collections/speechlm2/models/duplex_stt_model.py:978:12: C0103: Variable name "B_prompt" doesn't conform to snake_case naming style (invalid-name)
nemo/collections/speechlm2/models/duplex_stt_model.py:978:38: C0103: Variable name "H_prompt" doesn't conform to snake_case naming style (invalid-name)
nemo/collections/speechlm2/models/duplex_stt_model.py:999:12: C0103: Variable name "T_local" doesn't conform to snake_case naming style (invalid-name)
nemo/collections/speechlm2/models/duplex_stt_model.py:1001:8: C0103: Variable name "B" doesn't conform to snake_case naming style (invalid-name)
nemo/collections/speechlm2/models/duplex_stt_model.py:1001:11: C0103: Variable name "T_local" doesn't conform to snake_case naming style (invalid-name)
nemo/collections/speechlm2/models/duplex_stt_model.py:1001:20: C0103: Variable name "H" doesn't conform to snake_case naming style (invalid-name)
nemo/collections/speechlm2/models/duplex_stt_model.py:1004:12: C0103: Variable name "T_tensor" doesn't conform to snake_case naming style (invalid-name)
nemo/collections/speechlm2/models/duplex_stt_model.py:1006:12: C0103: Variable name "T" doesn't conform to snake_case naming style (invalid-name)
nemo/collections/speechlm2/models/duplex_stt_model.py:1012:12: C0103: Variable name "T" doesn't conform to snake_case naming style (invalid-name)
nemo/collections/speechlm2/models/duplex_stt_model.py:944:4: R0912: Too many branches (22/12) (too-many-branches)
nemo/collections/speechlm2/models/duplex_stt_model.py:944:4: R0915: Too many statements (72/50) (too-many-statements)
nemo/collections/speechlm2/models/duplex_stt_model.py:1156:4: R0914: Too many local variables (21/15) (too-many-locals)
nemo/collections/speechlm2/models/duplex_stt_model.py:1161:8: C0103: Variable name "T_local" doesn't conform to snake_case naming style (invalid-name)
nemo/collections/speechlm2/models/duplex_stt_model.py:1162:8: C0103: Variable name "T" doesn't conform to snake_case naming style (invalid-name)
nemo/collections/speechlm2/models/duplex_stt_model.py:1163:8: C0103: Variable name "B" doesn't conform to snake_case naming style (invalid-name)
nemo/collections/speechlm2/models/duplex_stt_model.py:1180:16: C0103: Variable name "current_T" doesn't conform to snake_case naming style (invalid-name)
nemo/collections/speechlm2/models/duplex_stt_model.py:1221:4: R0913: Too many arguments (8/5) (too-many-arguments)
nemo/collections/speechlm2/models/duplex_stt_model.py:1221:4: R0917: Too many positional arguments (8/5) (too-many-positional-arguments)
nemo/collections/speechlm2/models/duplex_stt_model.py:1225:8: W0613: Unused argument 'decode_audio' (unused-argument)
nemo/collections/speechlm2/models/duplex_stt_model.py:1245:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/speechlm2/models/duplex_stt_model.py:1249:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/speechlm2/models/duplex_stt_model.py:1273:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/speechlm2/models/duplex_stt_model.py:1273:4: R0914: Too many local variables (21/15) (too-many-locals)
nemo/collections/speechlm2/models/duplex_stt_model.py:1353:23: W0718: Catching too general exception Exception (broad-exception-caught)
nemo/collections/speechlm2/models/duplex_stt_model.py:1282:8: R1702: Too many nested blocks (6/5) (too-many-nested-blocks)
nemo/collections/speechlm2/models/duplex_stt_model.py:1367:27: W0718: Catching too general exception Exception (broad-exception-caught)
nemo/collections/speechlm2/models/duplex_stt_model.py:1273:4: R0912: Too many branches (21/12) (too-many-branches)
nemo/collections/speechlm2/models/duplex_stt_model.py:1273:4: R0915: Too many statements (56/50) (too-many-statements)
nemo/collections/speechlm2/models/duplex_stt_model.py:1398:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/speechlm2/models/duplex_stt_model.py:1402:25: W1309: Using an f-string that does not have any interpolated variables (f-string-without-interpolation)
nemo/collections/speechlm2/models/duplex_stt_model.py:1401:8: W0612: Unused variable 'e' (unused-variable)
nemo/collections/speechlm2/models/duplex_stt_model.py:699:8: W0201: Attribute 'results_logger' defined outside __init__ (attribute-defined-outside-init)
nemo/collections/speechlm2/models/duplex_stt_model.py:700:8: W0201: Attribute 'bleu' defined outside __init__ (attribute-defined-outside-init)
nemo/collections/speechlm2/models/duplex_stt_model.py:702:8: W0201: Attribute 'turn_taking_metrics' defined outside __init__ (attribute-defined-outside-init)
nemo/collections/speechlm2/models/duplex_stt_model.py:710:12: W0201: Attribute 'src_bleu' defined outside __init__ (attribute-defined-outside-init)
nemo/collections/speechlm2/models/duplex_stt_model.py:711:12: W0201: Attribute 'src_wer' defined outside __init__ (attribute-defined-outside-init)
nemo/collections/speechlm2/models/duplex_stt_model.py:712:12: W0201: Attribute 'empty_user_text' defined outside __init__ (attribute-defined-outside-init)
nemo/collections/speechlm2/models/duplex_stt_model.py:1393:12: W0201: Attribute 'perception' defined outside __init__ (attribute-defined-outside-init)
nemo/collections/speechlm2/models/duplex_stt_model.py:63:0: R0904: Too many public methods (24/20) (too-many-public-methods)
nemo/collections/speechlm2/models/duplex_stt_model.py:21:0: W0611: Unused torch.nn.functional imported as F (unused-import)
nemo/collections/speechlm2/models/duplex_stt_model.py:24:0: W0611: Unused OmegaConf imported from omegaconf (unused-import)
nemo/collections/speechlm2/models/duplex_stt_model.py:26:0: W0611: Unused nn imported from torch (unused-import)
nemo/collections/speechlm2/models/duplex_stt_model.py:1:0: R0801: Similar lines in 2 files
==s2s_duplex_stt_infer:[24:37]
==s2s_duplex_stt_train:[33:47]
torch.cuda.set_device(int(os.environ["LOCAL_RANK"]))


@hydra_runner(config_path="conf", config_name="s2s_duplex_stt")
def train(cfg):
    OmegaConf.resolve(cfg)
    torch.distributed.init_process_group(backend="nccl")
    torch.set_float32_matmul_precision("medium")
    torch.backends.cudnn.allow_tf32 = True
    trainer = Trainer(**resolve_trainer_cfg(cfg.trainer))
    log_dir = exp_manager(trainer, cfg.get("exp_manager", None))
    OmegaConf.save(cfg, log_dir / "exp_config.yaml")

    # avoid using `=` in the checkpoint name (duplicate-code)
nemo/collections/speechlm2/models/duplex_stt_model.py:1:0: R0801: Similar lines in 2 files
==s2s_duplex_stt_infer:[42:48]
==s2s_duplex_stt_train:[55:61]
        tokenizer=model.tokenizer,
        frame_length=cfg.data.frame_length,
        source_sample_rate=cfg.data.source_sample_rate,
        target_sample_rate=cfg.data.target_sample_rate,
        input_roles=cfg.data.input_roles,
        output_roles=cfg.data.output_roles, (duplicate-code)

-----------------------------------
Your code has been rated at 7.32/10

Thank you for improving NeMo's documentation!

ankitapasad added 7 commits February 4, 2026 15:05

Turn off force alignment during validation.

291613c

Signed-off-by: Ankita Pasad <apasad@nvidia.com>

Fix agent EOS placement to occur after the start of user turn. Contro…

13addc5

…lled by data.fix_eos_placements=True, True by default. Signed-off-by: Ankita Pasad <apasad@nvidia.com>

On-the-fly early interruption with debugging options and wandb logging.

245a994

Signed-off-by: Ankita Pasad <apasad@nvidia.com>

setup.py to fix black and isort formatting

90a2ea8

Signed-off-by: Ankita Pasad <apasad@nvidia.com>

Noise augmentation and manifest files for MCQ eval

b86ba5b

Signed-off-by: Ankita Pasad <apasad@nvidia.com>

Include number normalization

6393bef

Signed-off-by: Ankita Pasad <apasad@nvidia.com>

setup.py to fix black and isort formatting

905008d

Signed-off-by: Ankita Pasad <apasad@nvidia.com>

ankitapasad marked this pull request as ready for review February 5, 2026 03:58

ankitapasad changed the title ~~[WIP] Modifications to handle force alignment, EOS placement, and OTF early interruption~~ Modifications to handle force alignment, EOS placement, and OTF early interruption Feb 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modifications to handle force alignment, EOS placement, and OTF early interruption#4

Modifications to handle force alignment, EOS placement, and OTF early interruption#4
ankitapasad wants to merge 7 commits into
kevinhu-nv:duplex-stt-rebasedfrom
ankitapasad:duplex-stt-rebased

ankitapasad commented Feb 5, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ankitapasad commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Uh oh!

github-actions Bot commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ankitapasad commented Feb 5, 2026 •

edited

Loading