This section provides a high-level view of how the application processes audio input and integrates with a modular backend architecture.
Audio Files You can upload audio recordings through the Web-based UI layer, which supports:
- Audio upload
- Viewing transcription, summaries, and performance metrics
- Localisation options (English/Chinese)
The uploaded audio is passed to the Backend API, which acts as the gateway to the backend service layer and provides similar capabilities.
Processing:
-
Audio Pre-processing Cleans and formats audio data for downstream tasks.
-
ASR Component (Automatic Speech Recognition) Converts audio into text using integrated ASR providers:
- FunASR
- OpenVINO
- OpenAI
-
Summariser Component Generates concise summaries of transcribed text using LLM providers:
- iPexLLM
- OpenVINO
-
Metrics Collector Monitors and collects:
- xPU utilisation for hardware performance
- LLM metrics for summarisation efficiency
-
Pipeline Service
The Pipeline Service manages multiple DL Streamer-based pipelines:
- Front Video Pipeline for front camera streams
- Back Video Pipeline for back camera streams
- IFPD Content Pipeline for interactive flat panel display content
A Media Server (MediaMTX) supports streaming and distribution of processed video feeds.
- Transcriptions and summaries can be accessed from the Web-based UI and file system. The path for file system is /<project-location>/<your-project-name>/. For example,
/storage/chapter-10/ - Performance metrics (e.g., utilisation, model efficiency) are displayed for monitoring.
- Localisation ensures outputs are available in multiple languages (English/Chinese).
- System Requirements: Check the hardware and software requirements for deploying the application.
- Get Started: Follow step-by-step instructions to set up the application.
- Application Flow: Check the flow of application.