Add Paraformer-zh ASR model support with OpenVINO inference and INT8 quantization#1629
Open
padatta wants to merge 4 commits intohuggingface:mainfrom
Open
Add Paraformer-zh ASR model support with OpenVINO inference and INT8 quantization#1629padatta wants to merge 4 commits intohuggingface:mainfrom
padatta wants to merge 4 commits intohuggingface:mainfrom
Conversation
- Implement modeling_speech2text.py following modeling_text2speech.py pattern - Add OVParaformerForSpeechSeq2Seq for Paraformer ASR model inference - Support single model and component-based architectures - Add comprehensive test suite with 10 test cases - CPU/GPU support with dynamic device switching - FP32/FP16/INT8 model support - Includes encoder, predictor, and decoder components
- Integrate INT8_SYM weight compression during export using NNCF - Use compress_to_fp16=True to store FP32 constants as FP16 for GPU - Skip redundant main_quantize pass for Paraformer models - Fix dtype issues: use int32 for positions/ranges and indices - Implement dynamic mask creation using shape operations for ONNX/OpenVINO - Fix CIF predictor tensor assignments for ScatterNDUpdate op - All changes enable successful GPU inference with INT8 models
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds comprehensive support for Paraformer-zh (FunASR) automatic speech recognition models in optimum-intel, including:
OVParaformerForSpeechSeq2Seq) following the established patterns fromOVModelForTextToSpeechSeq2SeqWhat's Changed
1. Paraformer Inference Model (
modeling_speech2text.py)OVParaformerForSpeechSeq2Seqclass for running Paraformer models with OpenVINO2. INT8 Quantization Support
INT8_SYMweight compression using NNCF during export3. GPU Compatibility Fixes
int32for positions/ranges,int64for large indicesmax_label_lenclamping (4096) to prevent memory issues4. Comprehensive Test Suite
Added
tests/openvino/test_paraformer.pywith 10 test cases:generate()API compatibilitytoken_numUsage Example
Export to OpenVINO with INT8 Quantization
optimum-cli export openvino \ --model damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch \ --task automatic-speech-recognition \ --weight-format int8 \ paraformer-zh-int8Testing
All 10 test cases pass:
export PARAFORMER_TEST_MODEL=/path/to/paraformer-zh/ov_models python -m unittest tests.openvino.test_paraformer -vChecklist
Note: This PR builds upon the existing Paraformer export support (commit f79a331) and adds the inference runtime capabilities to make the exported models usable within the optimum-intel framework.