SDK Usage Guide

This guide shows you how to use the Multimodal Embedding Serving microservice as a Python SDK for embedding text, images, and videos in your applications. The SDK provides a convenient wrapper around the REST API for seamless integration.

Model Selection: The examples in this guide use placeholder model names ("your-chosen-model"). Replace these with a specific model from Supported Models based on your requirements.

Installation

Option 1: Install from Wheel (Recommended for Production)

Build and install the microservice as a wheel package for clean, production-ready integration.

📖 Comprehensive Guide: See Wheel-Based Installation Guide for detailed instructions on building, installing, distributing, and troubleshooting wheel installations.

Quick Install:

# 1. Build the wheel
cd multimodal-embedding-serving
poetry build

# 2. Install in your project
pip install dist/multimodal_embedding_serving-0.1.1-py3-none-any.whl

# OR add to pyproject.toml (recommended)
# [tool.poetry.dependencies]
# multimodal-embedding-serving = {path = "wheels/multimodal_embedding_serving-0.1.1-py3-none-any.whl"}

Option 2: Install from Source (Development)

git clone https://github.com/intel/edge-ai-libraries
cd edge-ai-libraries/microservices/multimodal-embedding-serving
pip install -e .

Option 3: Using Poetry for Development

cd multimodal-embedding-serving
poetry install
poetry shell

Quick Start

1. Basic SDK Usage

# Import from the installed package
from multimodal_embedding_serving import get_model_handler, EmbeddingModel

# Create and load a model (replace with your chosen model from supported-models.md)
model_handler = get_model_handler("your-chosen-model")
model_handler.load_model()

# Create the application wrapper
embedding_model = EmbeddingModel(model_handler)

# Test the model
print("Model loaded successfully!")
print(f"Embedding dimension: {embedding_model.get_embedding_length()}")

2. Text Embeddings

# Single text embedding
text = "A beautiful sunset over the mountains"
embedding = embedding_model.embed_query(text)
print(f"Text embedding shape: {len(embedding)}")

# Multiple text embeddings
texts = [
    "A red car driving down the road",
    "A blue ocean with white waves", 
    "A green forest in spring"
]
embeddings = embedding_model.embed_documents(texts)
print(f"Batch embeddings shape: {len(embeddings)}x{len(embeddings[0])}")

Text-only models: Qwen text embeddings expose only the text encoder. Use the /model/capabilities endpoint or embedding_model.get_supported_modalities() to confirm modality support before invoking image/video helpers.

Qwen text embeddings with OpenVINO INT8

from multimodal_embedding_serving import get_model_handler, EmbeddingModel

handler = get_model_handler(
    "QwenText/qwen3-embedding-0.6b",
    device="GPU",  # or CPU / AUTO
    use_openvino=True,
    ov_models_dir="./ov-models"
)
handler.load_model()

embedding_model = EmbeddingModel(handler)
print(embedding_model.get_supported_modalities())  # ['text']

query = "How does photosynthesis work?"
embedding = embedding_model.embed_query(query)
print(len(embedding))

3. Image Embeddings

Image helpers require a model with image modality support (e.g., CLIP, MobileCLIP, SigLIP, BLIP-2). They are not available when a text-only model such as Qwen is active.

From URL

import asyncio

async def process_image_url():
    image_url = "https://example.com/image.jpg"
    embedding = await embedding_model.get_image_embedding_from_url(image_url)
    print(f"Image embedding shape: {len(embedding)}")

# Run async function
asyncio.run(process_image_url())

From Base64

import base64
from PIL import Image
import io

# Convert image to base64
image = Image.new('RGB', (224, 224), color='red')
buffer = io.BytesIO()
image.save(buffer, format='JPEG')
image_base64 = base64.b64encode(buffer.getvalue()).decode()

# Get embedding
embedding = embedding_model.get_image_embedding_from_base64(image_base64)
print(f"Image embedding shape: {len(embedding)}")

4. Video Embeddings

Video helpers rely on image encoders under the hood; ensure the active model advertises video support via embedding_model.supports_video().

From_URL

async def process_video_url():
    video_url = "https://example.com/video.mp4"
    
    # Basic video processing
    frame_embeddings = await embedding_model.get_video_embedding_from_url(video_url)
    print(f"Video frame embeddings: {len(frame_embeddings)} frames")
    
    # With custom segment configuration
    segment_config = {
        "startOffsetSec": 10,
        "clip_duration": 30,
        "num_frames": 16
    }
    frame_embeddings = await embedding_model.get_video_embedding_from_url(
        video_url, segment_config
    )
    print(f"Custom video embeddings: {len(frame_embeddings)} frames")

asyncio.run(process_video_url())

From Local File

async def process_local_video():
    video_path = "/path/to/your/video.mp4"
    
    # Advanced frame sampling options
    segment_config = {
        "fps": 2.0,  # Extract 2 frames per second
        "startOffsetSec": 0,
        "clip_duration": -1  # Process entire video
    }
    
    frame_embeddings = await embedding_model.get_video_embedding_from_file(
        video_path, segment_config
    )
    print(f"Local video embeddings: {len(frame_embeddings)} frames")

asyncio.run(process_local_video())

Using Specific Frame Indices

segment_config = {
    "frame_indexes": [0, 15, 30, 45, 60],  # Extract specific frames
    "startOffsetSec": 5,
    "clip_duration": 20
}

frame_embeddings = await embedding_model.get_video_embedding_from_file(
    "video.mp4", segment_config
)

Advanced Configuration

1. Using Different Models

from multimodal_embedding_serving import get_model_handler, EmbeddingModel

# Standard CLIP
clip_handler = get_model_handler("your-chosen-model")
clip_model = EmbeddingModel(clip_handler)

# Chinese CLIP for multilingual support
cn_clip_handler = get_model_handler("CN-CLIP/cn-clip-vit-b-16")
cn_clip_model = EmbeddingModel(cn_clip_handler)

# Mobile-optimized CLIP
mobile_handler = get_model_handler("MobileCLIP/mobileclip_b")
mobile_model = EmbeddingModel(mobile_handler)

# BLIP-2 for advanced multimodal understanding
blip2_handler = get_model_handler("Blip2/blip2_transformers")
blip2_model = EmbeddingModel(blip2_handler)

2. OpenVINO Optimization

from multimodal_embedding_serving import get_model_handler, EmbeddingModel

# Enable OpenVINO for faster inference
model_handler = get_model_handler(
    model_id="your-chosen-model",
    device="CPU",
    use_openvino=True,
    ov_models_dir="./ov-models"
)
model_handler.load_model()
embedding_model = EmbeddingModel(model_handler)

3. GPU Acceleration (if available)

from multimodal_embedding_serving import get_model_handler

# Use GPU for inference
model_handler = get_model_handler(
    model_id="your-chosen-model",
    device="GPU"
)

Practical Examples

1. Image-Text Similarity

import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# Get embeddings
text_embedding = embedding_model.embed_query("A red sports car")
image_embedding = await embedding_model.get_image_embedding_from_url(
    "https://example.com/red_car.jpg"
)

# Calculate similarity
similarity = cosine_similarity(
    [text_embedding], 
    [image_embedding]
)[0][0]
print(f"Similarity: {similarity:.3f}")

2. Video Content Search

async def search_video_content():
    # Process video to get frame embeddings
    video_embeddings = await embedding_model.get_video_embedding_from_file(
        "movie.mp4",
        {"fps": 0.5, "clip_duration": -1}  # 1 frame every 2 seconds
    )
    
    # Search query
    query = "person walking in a park"
    query_embedding = embedding_model.embed_query(query)
    
    # Find most similar frames
    similarities = []
    for i, frame_emb in enumerate(video_embeddings):
        sim = cosine_similarity([query_embedding], [frame_emb])[0][0]
        similarities.append((i, sim))
    
    # Get top 5 matches
    top_matches = sorted(similarities, key=lambda x: x[1], reverse=True)[:5]
    
    for frame_idx, similarity in top_matches:
        timestamp = frame_idx * 2  # Since we used 0.5 fps
        print(f"Frame {frame_idx} (t={timestamp}s): {similarity:.3f}")

asyncio.run(search_video_content())

3. Multilingual Text Processing

from multimodal_embedding_serving import get_model_handler, EmbeddingModel

# Using CN-CLIP for Chinese text
cn_clip_handler = get_model_handler("CN-CLIP/cn-clip-vit-b-16")
cn_clip_handler.load_model()
cn_model = EmbeddingModel(cn_clip_handler)

# Process Chinese and English text
texts = [
    "一只可爱的小猫",  # Chinese: "A cute little cat"
    "A beautiful landscape",
    "红色的汽车",  # Chinese: "Red car"
    "Blue ocean waves"
]

embeddings = cn_model.embed_documents(texts)
print(f"Multilingual embeddings: {len(embeddings)} texts processed")

4. Batch Processing for Efficiency

async def batch_process_images():
    image_urls = [
        "https://example.com/image1.jpg",
        "https://example.com/image2.jpg", 
        "https://example.com/image3.jpg"
    ]
    
    # Process images concurrently
    import asyncio
    tasks = [
        embedding_model.get_image_embedding_from_url(url) 
        for url in image_urls
    ]
    
    embeddings = await asyncio.gather(*tasks)
    print(f"Processed {len(embeddings)} images")
    
    return embeddings

asyncio.run(batch_process_images())

Error Handling

from multimodal_embedding_serving import get_model_handler, EmbeddingModel

try:
    # Load model with error handling
    model_handler = get_model_handler("your-chosen-model")
    model_handler.load_model()
    embedding_model = EmbeddingModel(model_handler)
    
    # Check if model is healthy
    if embedding_model.check_health():
        print("Model is ready!")
    else:
        print("Model health check failed")
        
except Exception as e:
    print(f"Failed to load model: {e}")

try:
    # Process with error handling
    embedding = embedding_model.embed_query("test text")
    print("Text processed successfully")
except Exception as e:
    print(f"Processing failed: {e}")

Configuration Options

Model Selection

See Supported Models for all available models and their specifications.

from multimodal_embedding_serving import get_model_handler

# Example: Using different models
clip_handler = get_model_handler("your-chosen-model")
cn_clip_handler = get_model_handler("CN-CLIP/cn-clip-vit-b-16")
mobile_handler = get_model_handler("MobileCLIP/mobileclip_b")

OpenVINO Optimization

from multimodal_embedding_serving import get_model_handler

# Enable OpenVINO for Intel hardware acceleration
model_handler = get_model_handler(
    "your-chosen-model",
    use_openvino=True
)

Batch Processing

# Process multiple texts for better throughput
embeddings = embedding_model.embed_documents(text_batch)

Integration Examples

1. Flask Web Application

from flask import Flask, request, jsonify
from multimodal_embedding_serving import get_model_handler, EmbeddingModel
import asyncio

app = Flask(__name__)

# Initialize model globally
model_handler = get_model_handler("your-chosen-model")
model_handler.load_model()
embedding_model = EmbeddingModel(model_handler)

@app.route('/embed', methods=['POST'])
def embed_text():
    data = request.json
    text = data.get('text', '')
    
    try:
        embedding = embedding_model.embed_query(text)
        return jsonify({'embedding': embedding})
    except Exception as e:
        return jsonify({'error': str(e)}), 500

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

2. FastAPI Integration

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from multimodal_embedding_serving import get_model_handler, EmbeddingModel

app = FastAPI()

class TextRequest(BaseModel):
    text: str

@app.on_event("startup")
async def startup_event():
    global embedding_model
    model_handler = get_model_handler("your-chosen-model")
    model_handler.load_model()
    embedding_model = EmbeddingModel(model_handler)

@app.post("/embed")
async def embed_text(request: TextRequest):
    try:
        embedding = embedding_model.embed_query(request.text)
        return {"embedding": embedding}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

Troubleshooting

Common Issues

Model Loading Errors

# Check available models
from multimodal_embedding_serving import list_available_models
print(list_available_models())

Memory Issues

# Use smaller models for limited memory
from multimodal_embedding_serving import get_model_handler
model_handler = get_model_handler("MobileCLIP/mobileclip_s0")

OpenVINO Issues

# Disable OpenVINO if having issues
from multimodal_embedding_serving import get_model_handler
model_handler = get_model_handler(
    "your-chosen-model",
    use_openvino=False
)

Getting Help

Check the API Reference for detailed endpoint documentation
See Supported Models for model selection guidance
Review system requirements in System Requirements

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SDK Usage Guide

Installation

Option 1: Install from Wheel (Recommended for Production)

Option 2: Install from Source (Development)

Option 3: Using Poetry for Development

Quick Start

1. Basic SDK Usage

2. Text Embeddings

Qwen text embeddings with OpenVINO INT8

3. Image Embeddings

From URL

From Base64

4. Video Embeddings

From_URL

From Local File

Using Specific Frame Indices

Advanced Configuration

1. Using Different Models

2. OpenVINO Optimization

3. GPU Acceleration (if available)

Practical Examples

1. Image-Text Similarity

2. Video Content Search

3. Multilingual Text Processing

4. Batch Processing for Efficiency

Error Handling

Configuration Options

Model Selection

OpenVINO Optimization

Batch Processing

Integration Examples

1. Flask Web Application

2. FastAPI Integration

Troubleshooting

Common Issues

Getting Help

FilesExpand file tree

sdk-usage.md

Latest commit

History

sdk-usage.md

File metadata and controls

SDK Usage Guide

Installation

Option 1: Install from Wheel (Recommended for Production)

Option 2: Install from Source (Development)

Option 3: Using Poetry for Development

Quick Start

1. Basic SDK Usage

2. Text Embeddings

Qwen text embeddings with OpenVINO INT8

3. Image Embeddings

From URL

From Base64

4. Video Embeddings

From_URL

From Local File

Using Specific Frame Indices

Advanced Configuration

1. Using Different Models

2. OpenVINO Optimization

3. GPU Acceleration (if available)

Practical Examples

1. Image-Text Similarity

2. Video Content Search

3. Multilingual Text Processing

4. Batch Processing for Efficiency

Error Handling

Configuration Options

Model Selection

OpenVINO Optimization

Batch Processing

Integration Examples

1. Flask Web Application

2. FastAPI Integration

Troubleshooting

Common Issues

Getting Help