Skip to content

Latest commit

 

History

History
62 lines (40 loc) · 2.14 KB

File metadata and controls

62 lines (40 loc) · 2.14 KB

How It Works

This section provides a high-level view of how the application processes audio input and integrates with a modular backend architecture.

High-Level System Diagram

Inputs

Audio Files You can upload audio recordings through the Web-based UI layer, which supports:

  • Audio upload
  • Viewing transcription, summaries, and performance metrics
  • Localisation options (English/Chinese)

The uploaded audio is passed to the Backend API, which acts as the gateway to the backend service layer and provides similar capabilities.

Processing:

  • Audio Pre-processing Cleans and formats audio data for downstream tasks.

  • ASR Component (Automatic Speech Recognition) Converts audio into text using integrated ASR providers:

    • FunASR
    • OpenVINO
    • OpenAI
  • Summariser Component Generates concise summaries of transcribed text using LLM providers:

    • iPexLLM
    • OpenVINO
  • Metrics Collector Monitors and collects:

    • xPU utilisation for hardware performance
    • LLM metrics for summarisation efficiency
  • Pipeline Service

The Pipeline Service manages multiple DL Streamer-based pipelines:

  • Front Video Pipeline for front camera streams
  • Back Video Pipeline for back camera streams
  • IFPD Content Pipeline for interactive flat panel display content

A Media Server (MediaMTX) supports streaming and distribution of processed video feeds.

Outputs

  • Transcriptions and summaries can be accessed from the Web-based UI and file system. The path for file system is /<project-location>/<your-project-name>/. For example, /storage/chapter-10/
  • Performance metrics (e.g., utilisation, model efficiency) are displayed for monitoring.
  • Localisation ensures outputs are available in multiple languages (English/Chinese).

Learn More