Skip to content

Adding Support for ShunyaLabs - Pingala V1 ASR models (shunyalabs/pingala-v1-en-verbatim, shunyalabs/pingala-v1-universal) on OpenASR Leaderboard#87

Closed
ayush-shunyalabs wants to merge 2 commits intohuggingface:mainfrom
ayush-shunyalabs:main
Closed

Adding Support for ShunyaLabs - Pingala V1 ASR models (shunyalabs/pingala-v1-en-verbatim, shunyalabs/pingala-v1-universal) on OpenASR Leaderboard#87
ayush-shunyalabs wants to merge 2 commits intohuggingface:mainfrom
ayush-shunyalabs:main

Conversation

@ayush-shunyalabs
Copy link

Hi @Vaibhavs10 we just released our models shunyalabs/pingala-v1-en-verbatim and shunyalabs/pingala-v1-universal can you add our model to OpenASR leaderboard.

We have achieved this result on a L40 GPU.

The shunyalabs/pingala-v1-en-verbatim model is a ctranslate2 model, added this to the ctranslate2 section and the model shunyalabs/pingala-v1-universal is a transformer one, added the necessary files for evaluation.

Results for shunyalabs/pingala-v1-en-verbatim:

********************************************************************************
Results per dataset:
********************************************************************************
shunyalabs/pingala-v1-en-verbatim | hf-audio-esb-datasets-test-only-sorted_ami_test: WER = 3.52 %, RTFx = 18.38
shunyalabs/pingala-v1-en-verbatim | hf-audio-esb-datasets-test-only-sorted_earnings22_test: WER = 4.36 %, RTFx = 25.67
shunyalabs/pingala-v1-en-verbatim | hf-audio-esb-datasets-test-only-sorted_gigaspeech_test: WER = 4.26 %, RTFx = 24.62
shunyalabs/pingala-v1-en-verbatim | hf-audio-esb-datasets-test-only-sorted_librispeech_test.clea: WER = 1.84 %, RTFx = 29.20
shunyalabs/pingala-v1-en-verbatim | hf-audio-esb-datasets-test-only-sorted_librispeech_test.other: WER = 2.81 %, RTFx = 25.01
shunyalabs/pingala-v1-en-verbatim | hf-audio-esb-datasets-test-only-sorted_spgispeech_test: WER = 1.13 %, RTFx = 11.67
shunyalabs/pingala-v1-en-verbatim | hf-audio-esb-datasets-test-only-sorted_tedlium_test: WER = 2.14 %, RTFx = 11.03
shunyalabs/pingala-v1-en-verbatim | hf-audio-esb-datasets-test-only-sorted_voxpopuli_test: WER = 3.47 %, RTFx = 31.81

********************************************************************************
Composite Results:
********************************************************************************
shunyalabs/pingala-v1-en-verbatim: WER = 2.94 %
shunyalabs/pingala-v1-en-verbatim: RTFx = 14.61
********************************************************************************

Results for shunyalabs/pingala-v1-universal:

********************************************************************************
Results per dataset:
********************************************************************************
shunyalabs/pingala-v1-universal | hf-audio-esb-datasets-test-only-sorted_ami_test: WER = 4.19 %, RTFx = 70.22
shunyalabs/pingala-v1-universal | hf-audio-esb-datasets-test-only-sorted_earnings22_test: WER = 5.83 %, RTFx = 101.52
shunyalabs/pingala-v1-universal | hf-audio-esb-datasets-test-only-sorted_gigaspeech_test: WER = 4.99 %, RTFx = 131.09
shunyalabs/pingala-v1-universal | hf-audio-esb-datasets-test-only-sorted_librispeech_test.clea: WER = 0.71 %, RTFx = 158.74
shunyalabs/pingala-v1-universal | hf-audio-esb-datasets-test-only-sorted_librispeech_test.other: WER = 2.17 %, RTFx = 142.40
shunyalabs/pingala-v1-universal | hf-audio-esb-datasets-test-only-sorted_spgispeech_test: WER = 1.10 %, RTFx = 170.85
shunyalabs/pingala-v1-universal | hf-audio-esb-datasets-test-only-sorted_tedlium_test: WER = 1.43 %, RTFx = 153.34
shunyalabs/pingala-v1-universal | hf-audio-esb-datasets-test-only-sorted_voxpopuli_test: WER = 4.34 %, RTFx = 179.28

********************************************************************************
Composite Results:
********************************************************************************
shunyalabs/pingala-v1-universal: WER = 3.10 %
shunyalabs/pingala-v1-universal: RTFx = 146.23
********************************************************************************

Do let me know if anything is required, I'll be happy to contribute.

@souravbandyo
Copy link

Hi @Vaibhavs10, kindly look into this PR.

@souravbandyo
Copy link

Hi @Deep-unlearning kindly look into this.

@MyButtermilk
Copy link

MyButtermilk commented Aug 21, 2025

@souravbandyo ayush-shunyalabs
Could you please look at the transcription quality for German? I tried it on your website and I only get gibberish - which is surprising because the other word error rates you report here are state of the art.

You may want to fine tune on German. You could use the newly released and huge multi lingual dataset from Nvidia called Granary.

It also contains plenty of German Transcriptions:
https://huggingface.co/datasets/nvidia/Granary

@ayush-bar-uwc
Copy link

ayush-bar-uwc commented Aug 26, 2025

@souravbandyo ayush-shunyalabs Could you please look at the transcription quality for German? I tried it on your website and I only get gibberish - which is surprising because the other word error rates you report here are state of the art.

You may want to fine tune on German. You could use the newly released and huge multi lingual dataset from Nvidia called Granary.

It also contains plenty of German Transcriptions: https://huggingface.co/datasets/nvidia/Granary

Thank you so much @MyButtermilk for taking the time to share your feedback and also for pointing us to the Granary dataset

We tried out our model shunyalabs/pingala-v1-universal on a couple of German audio samples. Here are some quick examples alongside their original transcripts:

Sample 1:
achtgesichterambiwasse_0001.wav

  • Original:
um zu den göttlichen Schönheiten der Vergänglichkeit gezählt zu werden. Ihr Hals war biegsam wie eine Reiherfeder
  • Model output:
um zu den göttlichen schönheiten der vergänglichkeit gezählt zu werden ihr hals war biegsam wie eine reiherfeder

Sample 2:
achtgesichterambiwasse_0000.wav

  • Original:
Hanake hatte allen Körperschmuck, den ein japanisches Mädchen sitzend, trippelnd und liegend zeigen muß
  • Model output:
hanake hat er allen körper schmuck den ein japanisches mädchen sitzend trippelnd und liegend zeigen muß

We’ll continue testing more extensively across German data and also look into fine-tuning with Granary to strengthen performance further.

Thanks again for highlighting this, it really helps us improve.
If you’d like, please also visit shunyalabs.ai, select German as the language, and try it out. We’d love to hear how it works for you.

@Vaibhavs10
Copy link
Contributor

Closing this in favor of: #92

@Vaibhavs10 Vaibhavs10 closed this Sep 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants