-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Open
Labels
Description
I am trying to finetune the Nvidia models on the Openslr and CORAA ASR with some youtube extracted audios but the WER is even more higher than the base mode.
Model added too much noise, the base model has an average accuracy on similar test set is 85% but on the finetuned model it decreases to 50%
I have tried with my code and the available finetuning code in the Nemo repo but it does not work
Below I am adding my finetuning code
import os
import torch
import nemo.collections.asr as nemo_asr
from omegaconf import OmegaConf, open_dict
import lightning.pytorch as pl
# --- 1. SET PATHS (Based on your setup) ---
ROOT = os.getcwd()
TRAIN_MANIFEST = os.path.join(ROOT, "train_manifest_nemo.jsonl")
VAL_MANIFEST = os.path.join(ROOT, "val_manifest_nemo.jsonl")
# --- 2. LOAD MODEL ---
print("Loading model...")
model = nemo_asr.models.EncDecHybridRNNTCTCBPEModel.from_pretrained(
model_name="nvidia/stt_pt_fastconformer_hybrid_large_pc"
)
# --- 3. THE ERROR FIX: Patch Missing Mandatory Values ---
# This block prevents the "Missing mandatory value: dir" and "manifest_filepath" errors
with open_dict(model.cfg):
# Fix the tokenizer directory crash
if 'tokenizer' in model.cfg:
model.cfg.tokenizer.dir = ""
# Fill placeholders for all data sections to satisfy OmegaConf
for ds in ['train_ds', 'validation_ds', 'test_ds']:
if ds in model.cfg:
model.cfg[ds].manifest_filepath = TRAIN_MANIFEST if ds == 'train_ds' else VAL_MANIFEST
model.cfg[ds].batch_size = 1 # Keep it 1 for CPU stability
model.cfg[ds].num_workers = 0
model.cfg[ds].pin_memory = False
# --- 4. SETUP DATA ---
print("Setting up data loaders...")
model.setup_training_data(model.cfg.train_ds)
model.setup_validation_data(model.cfg.validation_ds)
# --- 5. RESULT ORIENTED: Protect the 85% Accuracy Baseline ---
# We freeze the encoder so your small pilot data doesn't "break" the model.
# This fixes the "one-word output (ela)" issue.
model.encoder.freeze()
print("Encoder frozen. Only fine-tuning Decoders for the pilot.")
# --- 6. OPTIMIZATION SETUP ---
model.setup_optimization(
optim_config={
'lr': 1e-4,
'weight_decay': 0.001,
'sched': {
'name': 'CosineAnnealing',
'warmup_steps': 100,
'min_lr': 1e-6,
},
}
)
# --- 7. TRAINER (Strictly CPU-Compatible) ---
trainer = pl.Trainer(
max_epochs=5,
accelerator="cpu", # Change to "gpu" once you are on the RTX 4080
devices=1,
precision=32, # 16-bit is for GPU only; 32-bit is required for CPU
enable_checkpointing=True,
logger=False
)
# --- 8. EXECUTION ---
print("Starting Pilot Training...")
trainer.fit(model)
# --- 9. SAVE FINAL MODEL ---
model.save_to("pt_br_pilot_final.nemo")
print("Successfully saved: pt_br_pilot_final.nemo")Any support from your side is highly appreciated, Its not a personal project, it is my job requirement from client side we are targeting the more tahn 90% of accuracy so if anyone can guide me help me
It will be a great favour and I am very thankful to you for this guidance
Reactions are currently unavailable