-
Notifications
You must be signed in to change notification settings - Fork 205
Description
I would like to support funasr/paraformer-zh to export OpenVINO IR model from optimum-intel. The original model implementation status is
- The model network is implemented in python script file (https://github.com/modelscope/FunASR/tree/main/funasr) and covers SANMEncoder/ParaformerSANMDecoder/CifPredictor which separately in different script files in that repo.
- The model.pt only contains model parameters in (https://huggingface.co/funasr/paraformer-zh)
As the original model in https://github.com/modelscope/FunASR/tree/main/funasr is complex, i reimplement the code into one python file naming modeling_paraformer.py. The implementation of modeling_paraformer.py is based on torch.nn.Module. Please see the below part of code for details. I went through the whole optimum-intel github and it seems there is no proper place to put this modeling_paraformer.py. @rkazants may I have your suggestion how to do that? Thanks
class Paraformer(torch.nn.Module):
"""
Author: Speech Lab of DAMO Academy, Alibaba Group
Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition
https://arxiv.org/abs/2206.08317
"""
def __init__(
self,
specaug: Optional[str] = None,
specaug_conf: Optional[Dict] = None,
normalize: str = None,
normalize_conf: Optional[Dict] = None,
encoder: str = None,
encoder_conf: Optional[Dict] = None,
decoder: str = None,
decoder_conf: Optional[Dict] = None,
ctc: str = None,
ctc_conf: Optional[Dict] = None,
predictor: str = None,
predictor_conf: Optional[Dict] = None,
ctc_weight: float = 0.5,
input_size: int = 80,
vocab_size: int = -1,
ignore_id: int = -1,
blank_id: int = 0,
sos: int = 1,
eos: int = 2,
lsm_weight: float = 0.0,
length_normalized_loss: bool = False,
# report_cer: bool = True,
# report_wer: bool = True,
# sym_space: str = "<space>",
# sym_blank: str = "<blank>",
# extract_feats_in_collect_stats: bool = True,
# predictor=None,
predictor_weight: float = 0.0,
predictor_bias: int = 0,
sampling_ratio: float = 0.2,
share_embedding: bool = False,
# preencoder: Optional[AbsPreEncoder] = None,
# postencoder: Optional[AbsPostEncoder] = None,
use_1st_decoder_loss: bool = False,
**kwargs,
):
super().__init__()
encoder = SANMEncoder(input_size=input_size, **encoder_conf)
encoder_output_size = encoder.output_size()
if decoder is not None:
decoder = ParaformerSANMDecoder(
vocab_size=vocab_size,
encoder_output_size=encoder_output_size,
**decoder_conf,
)
if predictor is not None:
predictor = CifPredictorV2(**predictor_conf)
self.encoder = encoder
self.decoder = decoder
self.predictor = predictor
def export(self, **kwargs):
if "max_seq_len" not in kwargs:
kwargs["max_seq_len"] = 512
models = export_rebuild_model(model=self, **kwargs)
return models