Skip to content

DataArcTech/Arabic_Benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Arabic TTS-ASR Benchmark Framework

中文版 | English

A modular framework for evaluating Arabic Text-to-Speech (TTS) and Automatic Speech Recognition (ASR) systems.

Overview

This framework provides:

  • TTS Generation: Generate audio from text using various TTS models
  • ASR Transcription: Transcribe audio using ASR models
  • Evaluation: Calculate WER/CER metrics with Arabic text normalization
  • Audio Quality Metrics: STOI, PESQ, Duration Error, and MCD

Quick Start

Installation

# Core dependencies
pip install torch transformers soundfile pandas jiwer tqdm python-dotenv

# Audio quality metrics (optional but recommended)
pip install pystoi pesq librosa scipy

Usage

# Full TTS-ASR pipeline with audio quality metrics
python main.py --dataset clArTTS --tts-model mms-tts-ara --asr-model whisper-large-v3

# ASR-only evaluation
python main.py --dataset everyayah --asr-model whisper-large-v3

# Skip TTS (use existing audio)
python main.py --dataset clArTTS --tts-model mms-tts-ara --asr-model whisper-large-v3 --skip-tts

# Skip audio quality metrics (faster evaluation)
python main.py --dataset clArTTS --tts-model mms-tts-ara --asr-model whisper-large-v3 --skip-audio-metrics

Supported Models

TTS Models

  • mms-tts-ara: Meta MMS-TTS Arabic
  • openaudio-s1-mini: OpenAudio S1-mini (Fish Speech)
  • elevenlabs-multilingual-v2: ElevenLabs API
  • minimax-speech-02-hd: MiniMax API

ASR Models

  • whisper-large-v3: OpenAI Whisper Large V3
  • qwen3-omni: Qwen3-Omni 30B
  • conformer-ctc: NeMo Conformer-CTC

Supported Datasets

  • clArTTS: Classical Arabic TTS dataset (205 samples)
  • everyayah: Quran recitation dataset (~6,000 samples)
  • arvoice: Arabic voice dataset
  • Ruisheng_TTS: Ruisheng TTS dataset (68 samples)

Output

Results are saved in results/{dataset}/{tts_model}_to_{asr_model}/:

  • generated_audio/: Generated WAV files
  • transcriptions.jsonl: ASR transcriptions
  • evaluation_results.csv: Per-sample metrics (WER, CER, STOI, PESQ, DE, MCD)
  • evaluation_summary.csv: Overall metrics with averages
  • timing.json: Performance metrics

Evaluation Metrics

Text Metrics:

  • WER (Word Error Rate): Word-level transcription accuracy
  • CER (Character Error Rate): Character-level transcription accuracy

Audio Quality Metrics:

  • STOI (Short-Time Objective Intelligibility): Speech intelligibility (0-1, higher is better)
  • PESQ (Perceptual Evaluation of Speech Quality): Speech quality (-0.5 to 4.5, higher is better)
  • DE (Duration Error): Relative duration difference (0 to inf, lower is better)
  • MCD (Mel-Cepstral Distortion): Spectral distance (lower is better, <6.0 is good)

Adding New Datasets

  1. Prepare dataset structure:
datasets/my_dataset/
├── metadata.csv
└── wav/
    ├── 00000.wav
    └── ...
  1. Create metadata.csv:
id,file,text
0,00000.wav,النص العربي هنا
1,00001.wav,نص آخر
  1. Register in src/benchmark/config/dataset_config.py:
"my_dataset": DatasetConfig(
    name="my_dataset",
    metadata_file="datasets/my_dataset/metadata.csv",
    audio_dir="datasets/my_dataset/wav",
    id_column="id",
    text_column="text",
    audio_column="file"
),

Adding New TTS Models

  1. Create TTS module in src/benchmark/modules/tts/my_tts.py:
from .base_tts import BaseTTS

class MyTTS(BaseTTS):
    def load(self):
        # Load your model
        pass
    
    def synthesize(self, text: str, output_path: str) -> tuple[float, float]:
        # Generate audio and save to output_path
        # Return (generation_time, audio_duration)
        pass
  1. Register in src/benchmark/modules/tts/__init__.py:
from .my_tts import MyTTS
  1. Add config in src/benchmark/config/model_config.py:
"my-tts": TTSModelConfig(
    model_name="my-tts",
    model_type="my_tts",
    model_path="models/my-tts",
    device="cuda",
    sampling_rate=16000
),

Adding New ASR Models

  1. Create ASR module in src/benchmark/modules/asr/my_asr.py:
from .base_asr import BaseASR

class MyASR(BaseASR):
    def load(self):
        # Load your model
        pass
    
    def transcribe(self, audio_path: str) -> tuple[str, float]:
        # Transcribe audio
        # Return (transcription, transcription_time)
        pass
  1. Register in src/benchmark/modules/asr/__init__.py:
from .my_asr import MyASR
  1. Add config in src/benchmark/config/model_config.py:
"my-asr": ASRModelConfig(
    model_name="my-asr",
    model_type="my_asr",
    model_path="models/my-asr",
    device="cuda",
    language="ar"
),

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages