Skip to content

Latest commit

 

History

History

README.md

Multimodal Embedding Serving Microservice

A multimodal embedding microservice that enables seamless integration of vision-language understanding into applications through OpenAI-compliant APIs. The microservice supports multiple state-of-the-art models including CLIP, CN-CLIP, MobileCLIP, SigLIP, BLIP-2, and Qwen text embeddings, accepting modality-appropriate inputs and returning high-dimensional embeddings that capture their semantic content in a shared space.

The microservice is optimized for performance and scalability, supporting batch processing and deployment on both cloud and edge environments. By abstracting the complexity of model management and inference, the microservice accelerates the adoption of advanced vision-language AI in diverse use cases.

Documentation

  • Overview

    • Overview: A high-level introduction to the microservice architecture and capabilities.
  • Getting Started

    • Get Started: Step-by-step guide to getting started with the microservice.
    • Quick Reference: Essential commands and configurations at a glance.
    • System Requirements: Hardware and software requirements for running the microservice.
  • Usage

    • SDK Usage: Complete guide for using the service as a Python SDK.
    • Wheel Installation: Comprehensive guide for building and installing as a Python wheel package.
    • Supported Models: Complete list of supported models and their configurations.
  • Deployment

  • API Reference

    • API Reference: Comprehensive reference for the available REST API endpoints.
  • Release Notes

    • Release Notes: Information on the latest updates, improvements, and bug fixes.

See Get Started for detailed setup instructions.