Skip to content

Support funasr/paraformer-zh to export OpenVINO IR model #1567

@10vliu13

Description

@10vliu13

I would like to support funasr/paraformer-zh to export OpenVINO IR model from optimum-intel. The original model implementation status is

  1. The model network is implemented in python script file (https://github.com/modelscope/FunASR/tree/main/funasr) and covers SANMEncoder/ParaformerSANMDecoder/CifPredictor which separately in different script files in that repo.
  2. The model.pt only contains model parameters in (https://huggingface.co/funasr/paraformer-zh)

As the original model in https://github.com/modelscope/FunASR/tree/main/funasr is complex, i reimplement the code into one python file naming modeling_paraformer.py. The implementation of modeling_paraformer.py is based on torch.nn.Module. Please see the below part of code for details. I went through the whole optimum-intel github and it seems there is no proper place to put this modeling_paraformer.py. @rkazants may I have your suggestion how to do that? Thanks

class Paraformer(torch.nn.Module):
"""
Author: Speech Lab of DAMO Academy, Alibaba Group
Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition
https://arxiv.org/abs/2206.08317
"""

def __init__(
    self,
    specaug: Optional[str] = None,
    specaug_conf: Optional[Dict] = None,
    normalize: str = None,
    normalize_conf: Optional[Dict] = None,
    encoder: str = None,
    encoder_conf: Optional[Dict] = None,
    decoder: str = None,
    decoder_conf: Optional[Dict] = None,
    ctc: str = None,
    ctc_conf: Optional[Dict] = None,
    predictor: str = None,
    predictor_conf: Optional[Dict] = None,
    ctc_weight: float = 0.5,
    input_size: int = 80,
    vocab_size: int = -1,
    ignore_id: int = -1,
    blank_id: int = 0,
    sos: int = 1,
    eos: int = 2,
    lsm_weight: float = 0.0,
    length_normalized_loss: bool = False,
    # report_cer: bool = True,
    # report_wer: bool = True,
    # sym_space: str = "<space>",
    # sym_blank: str = "<blank>",
    # extract_feats_in_collect_stats: bool = True,
    # predictor=None,
    predictor_weight: float = 0.0,
    predictor_bias: int = 0,
    sampling_ratio: float = 0.2,
    share_embedding: bool = False,
    # preencoder: Optional[AbsPreEncoder] = None,
    # postencoder: Optional[AbsPostEncoder] = None,
    use_1st_decoder_loss: bool = False,
    **kwargs,
):

    super().__init__()
    encoder = SANMEncoder(input_size=input_size, **encoder_conf)
    encoder_output_size = encoder.output_size()

    if decoder is not None:
        decoder = ParaformerSANMDecoder(
            vocab_size=vocab_size,
            encoder_output_size=encoder_output_size,
            **decoder_conf,
        )

    if predictor is not None:
        predictor = CifPredictorV2(**predictor_conf)

    self.encoder = encoder
    self.decoder = decoder
    self.predictor = predictor

def export(self, **kwargs):

    if "max_seq_len" not in kwargs:
        kwargs["max_seq_len"] = 512
    models = export_rebuild_model(model=self, **kwargs)
    return models

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions