Skip to content

Add support for Typhoon ASR Real-Time model with transcribe() function#19

Draft
Copilot wants to merge 6 commits intomainfrom
copilot/add-typhoon-asr-realtime-support
Draft

Add support for Typhoon ASR Real-Time model with transcribe() function#19
Copilot wants to merge 6 commits intomainfrom
copilot/add-typhoon-asr-realtime-support

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Jan 31, 2026

Adds scb10x/typhoon-asr-realtime model support to pythaiasr with comprehensive real-time inference capabilities.

Changes

  • Model integration: Added Typhoon ASR to supported models list. Uses NVIDIA NeMo toolkit instead of HuggingFace transformers (existing models unaffected).

  • Audio processing: Typhoon models require different handling:

    • Accepts file paths or numpy arrays
    • Resamples to 16kHz and normalizes audio via np.max(np.abs(data))
    • Uses temporary WAV files for NeMo's transcribe() API
  • Real-time inference function: Added new transcribe() function specifically for Typhoon ASR models that provides:

    • Word-level timestamps (estimated, uniformly distributed)
    • Processing time metrics
    • Audio duration information
    • Rich structured output as dictionary instead of string
    • Device flexibility (CPU, CUDA, or auto)
  • Dependencies: Added optional [typhoon] extra with nemo-toolkit[asr]>=1.21.0 and librosa

  • Error handling: Raises informative ImportError when nemo-toolkit is missing, directing users to install via pip install pythaiasr[typhoon]

  • Tests: Added unit tests for model listing, import error handling, and the new transcribe() function. Integration tests available via TEST_TYPHOON_ASR=true env var.

Usage

Basic ASR (string output)

from pythaiasr import asr

# File path or numpy array
result = asr("audio.wav", model="scb10x/typhoon-asr-realtime", device="cpu")

Real-time inference with detailed output

from pythaiasr import transcribe

# Basic transcription
result = transcribe("audio.wav", model="scb10x/typhoon-asr-realtime")
print(result['text'])
print(f"Processing time: {result['processing_time']:.2f}s")

# With word-level timestamps
result = transcribe("audio.wav", with_timestamps=True)
for ts in result['timestamps']:
    print(f"{ts['word']}: {ts['start']:.2f}s - {ts['end']:.2f}s")

Installation requires the typhoon extra:

pip install pythaiasr[typhoon]

Note: Timestamps are estimated by uniformly distributing words across the audio duration and may not reflect actual word timing in natural speech with pauses or variable rates.

Original prompt

This section details on the original issue you should resolve

<issue_title>Add support typhoon-asr-realtime</issue_title>
<issue_description>Typhoon ASR Real-Time is a next-generation, open-source Automatic Speech Recognition (ASR) model built specifically for real-world streaming applications in the Thai language. It is designed to deliver fast and accurate transcriptions while running efficiently on standard CPUs. This enables users to host their own ASR service, reducing costs and avoiding the need to send sensitive data to third-party cloud services. The model is trained on 10,000 hours of Thai audio transcriptions to help it generalize to any environments.

HuggingFace model: https://huggingface.co/typhoon-ai/typhoon-asr-realtime

GitHub: https://github.com/scb-10x/typhoon-asr</issue_description>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 3 commits January 31, 2026 05:00
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
Copilot AI changed the title [WIP] Add support for Typhoon ASR Real-Time model Add support for Typhoon ASR Real-Time model Jan 31, 2026
Copilot AI requested a review from wannaphong January 31, 2026 05:10
@wannaphong
Copy link
Copy Markdown
Member

@copilot Add ASR Real-Time Inference

Copilot AI and others added 2 commits January 31, 2026 05:23
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
…d timestamp warnings

Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Jan 31, 2026

@copilot Add ASR Real-Time Inference

Added transcribe() function for real-time inference with Typhoon ASR. This provides word-level timestamps, processing time metrics, and audio duration information.

Example usage:

from pythaiasr import transcribe

result = transcribe("audio.wav", with_timestamps=True)
print(result['text'])
for ts in result['timestamps']:
    print(f"{ts['word']}: {ts['start']:.2f}s - {ts['end']:.2f}s")

Commit: 0ea8e24

Copilot AI changed the title Add support for Typhoon ASR Real-Time model Add support for Typhoon ASR Real-Time model with transcribe() function Jan 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support typhoon-asr-realtime

2 participants