This directory contains examples demonstrating how to use the Multimodal Embedding Serving microservice both as an SDK and as a server.
sdk_examples.py- Examples of using the service as an SDK/libraryserver_examples.py- Examples of using the service as a FastAPI serverREADME.md- This file
The SDK examples show how to import and use the embedding models directly in your Python code:
cd /path/to/multimodal-embedding-serving
python examples/sdk_examples.py- List Available Models - See all supported models
- Basic Text Embedding - Generate embeddings for single text
- Multiple Text Embeddings - Process multiple texts at once
- Model Comparison - Compare different models on same input
- OpenVINO Conversion - Convert models to OpenVINO format
from src.models import get_model_handler, list_available_models
from src.models.wrapper import EmbeddingModel
# Get a model handler
handler = get_model_handler("CLIP/clip-vit-b-16")
handler.load_model()
# Create wrapper for high-level operations
embedding_model = EmbeddingModel(handler)
# Generate embeddings
text_embedding = embedding_model.embed_query("A beautiful sunset")
multiple_embeddings = embedding_model.embed_documents(["text1", "text2"])The server examples show how to run the FastAPI server and interact with it via HTTP API:
cd /path/to/multimodal-embedding-serving
python examples/server_examples.pyexport EMBEDDING_MODEL_NAME=CLIP/clip-vit-b-16
export EMBEDDING_USE_OV=false
uvicorn app:app --host 0.0.0.0 --port 8080python launcher.pyOnce the server is running, you can test these endpoints:
curl -X GET http://localhost:8080/healthcurl -X POST http://localhost:8080/embed_query \
-H "Content-Type: application/json" \
-d '{"text": "A beautiful sunset"}'curl -X POST http://localhost:8080/embed_documents \
-H "Content-Type: application/json" \
-d '{"texts": ["A dog", "A cat", "A bird"]}'curl -X POST http://localhost:8080/embed_image_url \
-H "Content-Type: application/json" \
-d '{"image_url": "https://example.com/image.jpg"}'The service supports multiple model families:
CLIP/clip-vit-b-32CLIP/clip-vit-b-16CLIP/clip-vit-l-14CLIP/clip-vit-h-14
MobileCLIP/mobileclip_s0MobileCLIP/mobileclip_s1MobileCLIP/mobileclip_s2MobileCLIP/mobileclip_bMobileCLIP/mobileclip_blt
SigLIP/siglip2-vit-b-16SigLIP/siglip2-vit-l-16SigLIP/siglip2-so400m-patch16-384
Blip2/blip2_transformers
EMBEDDING_MODEL_NAME- The model to use (e.g., "CLIP/clip-vit-b-16")
EMBEDDING_USE_OV- Enable OpenVINO conversion (true/false, default: false)EMBEDDING_DEVICE- Device for inference (CPU/GPU, default: CPU)EMBEDDING_OV_MODELS_DIR- Directory for OpenVINO models (default: ./ov-models)
export EMBEDDING_MODEL_NAME=MobileCLIP/mobileclip_s0export EMBEDDING_MODEL_NAME=CLIP/clip-vit-b-16
export EMBEDDING_USE_OV=true
export EMBEDDING_OV_MODELS_DIR=./ov-modelsexport EMBEDDING_MODEL_NAME=SigLIP/siglip2-vit-b-16pip install torch transformers pillow numpypip install openvinopip install fastapi uvicorn# Build image
docker build -t multimodal-embedding-serving .
# Run with CLIP
docker run -p 8080:8080 \
-e EMBEDDING_MODEL_NAME=CLIP/clip-vit-b-16 \
multimodal-embedding-serving
# Run with OpenVINO
docker run -p 8080:8080 \
-e EMBEDDING_MODEL_NAME=CLIP/clip-vit-b-16 \
-e EMBEDDING_USE_OV=true \
-v $(pwd)/ov-models:/app/ov_models \
multimodal-embedding-serving- Model Selection: MobileCLIP models are smaller and faster than CLIP models
- OpenVINO: Enable for better CPU performance on Intel hardware
- Device Selection: Use GPU if available for larger models
- Batch Processing: Use
embed_documentsfor multiple texts
Make sure you're running from the correct directory and have all dependencies installed.
Check that the model name is correct and supported. Use list_available_models() to see all options.
Ensure OpenVINO is properly installed and you have write permissions to the models directory.
Check that the port 8080 is not already in use and that all environment variables are set correctly.