Skip to content

Commit ec665e4

Browse files
yogeshmpandeybhardwaj-nakulkrish918anshul-wagadreYogesh
authored
Video Search and Summary Sample Application and microservices (#349)
Signed-off-by: Krishna <krishna.murti@intel.com> Signed-off-by: Krishna Murti <krishna.murti@intel.com> Signed-off-by: Vellaisamy, Sathyendran <sathyendran.vellaisamy@intel.com> Signed-off-by: Pooja Kumbharkar <pooja.kumbharkar@intel.com> Signed-off-by: B, Vinod K <vinod.k.b@intel.com> Signed-off-by: Sudarshana Panda <sudarshana.panda@intel.com> Signed-off-by: Vinod K B <vinod.k.b@intel.com> Signed-off-by: Yeoh, Hoong Tee <hoong.tee.yeoh@intel.com> Signed-off-by: dmichalo <dawid.michalowski@intel.com> Co-authored-by: bhardwaj-nakul <nakul.bhardwaj@intel.com> Co-authored-by: Krishna Murti <krishna.murti@intel.com> Co-authored-by: AnshulWagadre <107472813+anshul-wagadre@users.noreply.github.com> Co-authored-by: Wagadre, Anshul <anshul.wagadre@intel.com> Co-authored-by: Yogesh <yogeshpandey@intel.com> Co-authored-by: Raghavendra Bhat <raghavendra.bhat@intel.com> Co-authored-by: Tomasz Janczak <Tomasz.Janczak@intel.com> Co-authored-by: msmiatac <153737147+msmiatac@users.noreply.github.com> Co-authored-by: nszczygl9 <118973656+nszczygl9@users.noreply.github.com> Co-authored-by: Vinod Kumar B <vinod.k.b@intel.com> Co-authored-by: Vellaisamy, Sathyendran <sathyendran.vellaisamy@intel.com> Co-authored-by: Pooja Kumbharkar <pooja.kumbharkar@intel.com> Co-authored-by: sathyendranv <84972945+sathyendranv@users.noreply.github.com> Co-authored-by: SudarshanaPanda <sudarshana.panda@intel.com> Co-authored-by: Basak Caprak Senzeybek <basak.caprak@intel.com> Co-authored-by: Nicolas Oliver <dario.n.oliver@intel.com> Co-authored-by: Hoong Tee, Yeoh <hoong.tee.yeoh@intel.com> Co-authored-by: ganesanintel <ganesan.v@intel.com> Co-authored-by: Tomasz Bujewski <tomasz.bujewski@intel.com> Co-authored-by: Dawid Michalowski <dawid.michalowski@intel.com> Co-authored-by: Michal Holownia <michal.holownia@intel.com> Co-authored-by: oommensy <steffy.a.oommen@intel.com> Co-authored-by: saikiransayabugari <saikiran.sayabugari@intel.com> Co-authored-by: Elroy Ashtian, Jr. <elroy.ashtian@intel.com> Co-authored-by: marcin-wadolkowski <106673332+marcin-wadolkowski@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
1 parent bf036d3 commit ec665e4

533 files changed

Lines changed: 77352 additions & 1 deletion

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,19 +21,22 @@ Key components of the **Edge AI Libraries**:
2121
| [OpenVINO&trade; toolkit](https://github.com/openvinotoolkit/openvino) | Library | [Link](https://docs.openvino.ai/2025/index.html) | [API Reference](https://docs.openvino.ai/2025/api/api_reference.html) |
2222
| [OpenVINO&trade; Training Extensions](https://github.com/open-edge-platform/training_extensions) | Library | [Link](https://github.com/open-edge-platform/training_extensions?tab=readme-ov-file#introduction) | [API Reference](https://github.com/open-edge-platform/training_extensions?tab=readme-ov-file#quick-start) |
2323
| [OpenVINO&trade; Model API](https://github.com/open-edge-platform/model_api) | Library | [Link](https://github.com/open-edge-platform/model_api?tab=readme-ov-file#installation) | [API Reference](https://github.com/open-edge-platform/model_api?tab=readme-ov-file#usage) |
24+
| [Audio Intelligence](microservices/audio-intelligence) | Microservice | [Link](microservices/audio-intelligence/docs/user-guide/get-started.md) | [API Reference](microservices/audio-intelligence/docs/user-guide/api-reference.md) |
2425
| [Deep Learning Streamer Pipeline Server](microservices/dlstreamer-pipeline-server) | Microservice | [Link](microservices/dlstreamer-pipeline-server#quick-try-out) | [API Reference](microservices/dlstreamer-pipeline-server/docs/user-guide/api-docs/pipeline-server.yaml) |
2526
| [Document Ingestion](microservices/document-ingestion) | Microservice | [Link](microservices/document-ingestion/pgvector/docs/get-started.md) | [API Reference](microservices/document-ingestion/pgvector/docs/dataprep-api.yml) |
2627
| [Model Registry](microservices/model-registry) | Microservice | [Link](microservices/model-registry/docs/user-guide/get-started.md) | [API Reference](microservices/model-registry/docs/user-guide/api-docs/openapi.yaml) |
28+
| [Multimodal Embedding Serving](microservices/multimodal-embedding-serving) | Microservice | [Link](microservices/multimodal-embedding-serving/docs/user-guide/get-started.md) | [API Reference](microservices/multimodal-embedding-serving/docs/user-guide/api-docs/openapi.yaml) |
2729
| [Time Series Analytics Microservice](microservices/time-series-analytics) | Microservice | [Link](microservices/time-series-analytics/docs/user-guide/Overview.md) | [Usage](microservices/time-series-analytics/docs/user-guide/get-started.md) |
2830
| [Vector Retriever (with Milvus)](microservices/vector-retriever/milvus/) | Microservice | [Link](microservices/vector-retriever/milvus/docs/user-guide/get-started.md) | [API Reference](microservices/vector-retriever/milvus/docs/user-guide/api-reference.md) |
2931
| [Visual-Data Preparation for Retrieval (with Milvus)](microservices/visual-data-preparation-for-retrieval/milvus/) | Microservice | [Link](microservices/visual-data-preparation-for-retrieval/milvus/docs/user-guide/get-started.md) | [API Reference](microservices/visual-data-preparation-for-retrieval/milvus/docs/user-guide/api-reference.md) |
32+
| [Visual-Data Preparation for Retrieval (with VDMS)](microservices/visual-data-preparation-for-retrieval/vdms/) | Microservice | [Link](microservices/visual-data-preparation-for-retrieval/vdms/docs/user-guide/get-started.md) | [API Reference](microservices/visual-data-preparation-for-retrieval/vdms/docs/user-guide/api-reference.md) |
3033
| [VLM Inference Serving](microservices/vlm-openvino-serving) | Microservice | [Link](microservices/vlm-openvino-serving/README.md) | [Usage](microservices/vlm-openvino-serving/README.md) |
3134
| [Intel® Geti™](https://github.com/open-edge-platform/geti)[`*`](#license) | Tool | [Link](https://geti.intel.com/) | [Docs](https://docs.geti.intel.com) |
3235
| [Intel® SceneScape](https://github.com/open-edge-platform/scenescape)[`*`](#license) | Tool | [Link](https://docs.openedgeplatform.intel.com/scenescape/main/user-guide/Getting-Started-Guide.html) | [Docs](https://docs.openedgeplatform.intel.com/scenescape/main/toc.html) |
3336
| [Visual Pipeline and Platform Evaluation Tool](tools/visual-pipeline-and-platform-evaluation-tool) | Tool | [Link](tools/visual-pipeline-and-platform-evaluation-tool/docs/user-guide/get-started.md) | [Build](tools/visual-pipeline-and-platform-evaluation-tool/docs/user-guide/how-to-build-source.md) instructions |
3437
| [Chat Question and Answer](sample-applications/chat-question-and-answer) | Sample Application | [Link](sample-applications/chat-question-and-answer/docs/user-guide/get-started.md) | [Build](sample-applications/chat-question-and-answer/docs/user-guide/build-from-source.md) instructions |
3538
| [Chat Question and Answer Core](sample-applications/chat-question-and-answer-core) | Sample Application | [Link](sample-applications/chat-question-and-answer-core/docs/user-guide/get-started.md) | [Build](sample-applications/chat-question-and-answer-core/docs/user-guide/build-from-source.md) instructions |
36-
39+
| [Video Search and Summarization](sample-applications/video-search-and-summarization) | Sample Application | [Link](sample-applications/video-search-and-summarization/docs/user-guide/get-started.md) | [Build](sample-applications/video-search-and-summarization/docs/user-guide/build-from-source.md) instructions |
3740

3841
> Intel, the Intel logo, OpenVINO, and the OpenVINO logo are trademarks of Intel Corporation or its subsidiaries.
3942
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# Python bytecode and cache
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
.pytest_cache/
6+
.coverage
7+
htmlcov/
8+
9+
# Poetry and Python virtual environments
10+
.venv/
11+
venv/
12+
ENV/
13+
.python-version
14+
15+
# Development and editor files
16+
.git/
17+
.github/
18+
.gitignore
19+
.idea/
20+
.vscode/
21+
*.swp
22+
*.swo
23+
24+
# Docker files (no need to copy these)
25+
docker/
26+
.dockerignore
27+
28+
# Documentation
29+
docs/
30+
*.md
31+
!README.md
32+
33+
# Testing
34+
tests/
35+
test_*.py
36+
*_test.py
37+
38+
# Local development artifacts
39+
data/
40+
models/
41+
logs/
42+
.env
43+
.env.*
44+
wget-log*
45+
46+
# Build artifacts
47+
*.so
48+
*.dylib
49+
*.dll
50+
dist/
51+
build/
52+
*.egg-info/
53+
54+
# Temporary files
55+
tmp/
56+
temp/
57+
*.tmp
58+
*.bak
59+
setup_docker.sh
60+
setup.sh
61+
setup_host.sh
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Audio Intelligence Microservice
2+
3+
This repository provides a FastAPI-based microservice for audio intelligence including speech transcription from video files using pywhispercpp or OpenVINO with openvino-genai.
4+
5+
Below, you'll find links to detailed documentation to help you get started, configure, and deploy the microservice.
6+
7+
## Documentation
8+
9+
- **Overview**
10+
- [Overview](docs/user-guide/Overview.md): A high-level introduction to the microservice.
11+
- [Overview Architecture](docs/user-guide/overview-architecture.md): Detailed architecture.
12+
13+
- **Getting Started**
14+
- [Get Started](docs/user-guide/get-started.md): Step-by-step guide to getting started with the microservice.
15+
- [System Requirements](docs/user-guide/system-requirements.md): Hardware and software requirements for running the microservice.
16+
17+
- **Deployment**
18+
- [How to Build from Source](docs/user-guide/how-to-build-from-source.md): Instructions for building the microservice from source code.
19+
20+
- **API Reference**
21+
- [API Reference](docs/user-guide/api-reference.md): Comprehensive reference for the available REST API endpoints.
22+
23+
- **Release Notes**
24+
- [Release Notes](docs/user-guide/release-notes.md): Information on the latest updates, improvements, and bug fixes.
25+
26+
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Security Policy
2+
3+
## Security practices
4+
5+
[![OpenSSF Best Practices](https://www.bestpractices.dev/projects/<project-id>/badge)](https://www.bestpractices.dev/projects/<project-id>)
6+
[![Coverity](https://scan.coverity.com/projects/<project-id>/badge.svg)](https://scan.coverity.com/projects/<project-name>)
7+
8+
## Report a Vulnerability
9+
10+
Please report security issues or vulnerabilities to the [Intel® Security Center].
11+
12+
For more information on how Intel® works to resolve security issues, see
13+
[Vulnerability Handling Guidelines].
14+
15+
[Intel® Security Center]:https://www.intel.com/security
16+
17+
[Vulnerability Handling Guidelines]:https://www.intel.com/content/www/us/en/security-center/vulnerability-handling-guidelines.html

microservices/audio-intelligence/audio_intelligence/__init__.py

Whitespace-only changes.

microservices/audio-intelligence/audio_intelligence/api/__init__.py

Whitespace-only changes.

microservices/audio-intelligence/audio_intelligence/api/endpoints/__init__.py

Whitespace-only changes.
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# Copyright (C) 2025 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
from fastapi import APIRouter
5+
6+
from audio_intelligence.schemas.transcription import HealthResponse
7+
8+
router = APIRouter()
9+
10+
11+
@router.get("/health", response_model=HealthResponse, tags=["Health API"], summary="Health status of API")
12+
async def health_check() -> HealthResponse:
13+
"""
14+
Health check endpoint.
15+
16+
Returns:
17+
A response indicating the service status, version and a descriptive message.
18+
"""
19+
return HealthResponse()
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# Copyright (C) 2025 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
from fastapi import APIRouter
5+
6+
from audio_intelligence.core.settings import settings
7+
from audio_intelligence.schemas.transcription import AvailableModelsResponse, WhisperModelInfo
8+
from audio_intelligence.utils.logger import logger
9+
10+
router = APIRouter()
11+
12+
13+
@router.get(
14+
"/models",
15+
response_model=AvailableModelsResponse,
16+
tags=["Models API"],
17+
summary="Get list of models available for use with detailed information",
18+
)
19+
async def get_available_models() -> AvailableModelsResponse:
20+
"""
21+
Get a list of available Whisper model variants that can be used for transcription.
22+
23+
This endpoint returns all the whisper models that are configured in the service
24+
and available for transcription requests, along with detailed information including
25+
display names, descriptions, and the default model that is used when no specific
26+
model is requested.
27+
28+
Returns:
29+
A response with the list of available models with their details and the default model
30+
"""
31+
logger.debug("Getting available models details")
32+
33+
# Get the list of enabled models from settings with their detailed information
34+
model_info_list = [model.to_dict() for model in settings.ENABLED_WHISPER_MODELS]
35+
36+
# Convert dictionaries to ModelInfo objects
37+
models = [WhisperModelInfo(**model_info) for model_info in model_info_list]
38+
default_model = settings.DEFAULT_WHISPER_MODEL.value
39+
40+
logger.debug(f"Available models: {len(models)} models, default: {default_model}")
41+
42+
return AvailableModelsResponse(
43+
models=models,
44+
default_model=default_model
45+
)
Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
# Copyright (C) 2025 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
import traceback
5+
from typing import Annotated
6+
7+
from fastapi import APIRouter, Query, HTTPException, status, Depends
8+
from pydantic.json_schema import SkipJsonSchema
9+
10+
from audio_intelligence.schemas.transcription import (
11+
ErrorResponse,
12+
TranscriptionResponse,
13+
TranscriptionStatus,
14+
TranscriptionFormData
15+
)
16+
from audio_intelligence.core.audio_extractor import AudioExtractor
17+
from audio_intelligence.core.transcriber import TranscriptionService
18+
from audio_intelligence.utils.file_utils import get_file_duration
19+
from audio_intelligence.utils.validation import RequestValidation
20+
from audio_intelligence.utils.transcription_utils import get_video_path, store_transcript_output
21+
from audio_intelligence.utils.logger import logger
22+
23+
router = APIRouter()
24+
25+
26+
@router.post(
27+
"/transcriptions",
28+
response_model=TranscriptionResponse,
29+
responses={
30+
status.HTTP_400_BAD_REQUEST: {"model": ErrorResponse},
31+
status.HTTP_500_INTERNAL_SERVER_ERROR: {"model": ErrorResponse},
32+
status.HTTP_422_UNPROCESSABLE_ENTITY: {"description": "Invalid request body or parameter provided"},
33+
},
34+
tags=["Transcription API"],
35+
summary="Transcribe audio from uploaded video file or a video stored at Minio"
36+
)
37+
async def transcribe_video(
38+
request: Annotated[TranscriptionFormData, Depends()],
39+
language: Annotated[
40+
str | SkipJsonSchema[None],
41+
Query(description="_(Optional)_ Language for transcription. If not provided, auto-detection will be used.")
42+
] = None
43+
) -> TranscriptionResponse:
44+
"""
45+
Transcribe speech from a video file.
46+
47+
Upload a video file directly or specify MinIO parameters to transcribe its audio content.
48+
49+
Two ways to provide the video:
50+
- Upload a video file using form-data
51+
- Specify MinIO parameters (minio_bucket, video_id, video_name) to retrieve from storage
52+
53+
Args:
54+
request: Form data containing the file or MinIO parameters and transcription settings
55+
language: Optional language code for transcription
56+
57+
Returns:
58+
A response with the transcription status and details
59+
"""
60+
61+
try:
62+
# Validate the request parameters
63+
RequestValidation.validate_form_data(request)
64+
65+
logger.info(f"Received transcription request for {'file upload' if request.file else 'MinIO video'}")
66+
logger.debug(f"Transcription parameters: model={request.model_name}, device={request.device}, language={language}")
67+
68+
# Get video path either from direct upload or MinIO
69+
video_path, filename = await get_video_path(request)
70+
71+
# Extract audio from video
72+
audio_path = await AudioExtractor.extract_audio(video_path)
73+
logger.debug(f"Audio extracted successfully to: {audio_path}")
74+
75+
# Get file duration
76+
duration = get_file_duration(video_path)
77+
logger.debug(f"File duration: {duration} seconds")
78+
79+
logger.info(f"Initializing transcription service with model: {request.model_name}, device: {request.device}")
80+
transcriber = TranscriptionService(
81+
model_name=request.model_name,
82+
device=request.device
83+
)
84+
85+
# Perform transcription
86+
job_id, transcript_path = await transcriber.transcribe(
87+
audio_path,
88+
language=language,
89+
include_timestamps=request.include_timestamps,
90+
video_duration=duration # Pass the video duration to optimize processing
91+
)
92+
93+
# Store the transcript output using the configured backend
94+
output_location = store_transcript_output(
95+
transcript_path,
96+
job_id,
97+
filename,
98+
minio_bucket=request.minio_bucket,
99+
video_id=request.video_id
100+
)
101+
102+
if not output_location:
103+
raise Exception("Failed to store transcript output.")
104+
105+
logger.info(f"Transcription completed using {transcriber.backend.value} on {transcriber.device_type.value}")
106+
107+
return TranscriptionResponse(
108+
status=TranscriptionStatus.COMPLETED,
109+
message="Transcription completed successfully",
110+
job_id=job_id,
111+
transcript_path=output_location,
112+
video_name=filename,
113+
video_duration=duration
114+
)
115+
116+
except HTTPException as http_exc:
117+
raise http_exc
118+
119+
except Exception as e:
120+
error_details = traceback.format_exc()
121+
logger.error(f"Transcription failed: {str(e)}")
122+
logger.debug(f"Error details: {error_details}")
123+
124+
raise HTTPException(
125+
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
126+
detail=ErrorResponse(
127+
error_message=f"Transcription failed!",
128+
details="An error occurred during transcription. Please check logs for details."
129+
).model_dump()
130+
)

0 commit comments

Comments
 (0)