Skip to content

Conversation

amansinghoriginal
Copy link
Member

@amansinghoriginal amansinghoriginal commented Sep 3, 2025

Summary

Adds a new Drasi reaction that synchronizes query results to vector stores using Microsoft Semantic Kernel, enabling real-time vector search capabilities on continuously updated data.

Here is a diagram explaining the overall flow:
image

Use Cases

  • Semantic Search: Enable natural language search over query results
  • Knowledge Base: Build searchable documentation from structured data
  • Product Discovery: Enhance e-commerce search with vector similarity
  • Content Recommendations: Find similar items based on embeddings

What's New

Core Features

  • Vector Store Sync: Keeps vector stores in sync with Drasi query results in real-time
  • Multiple Store Support: Supports Qdrant, Azure AI Search, and InMemory stores via Semantic Kernel
  • Embedding Generation: Integrates with Azure OpenAI and OpenAI for text embeddings
  • Template-based Processing: Uses Handlebars templates to transform query results into searchable documents

Implementation

  • Reaction: reactions/semantickernel/sync-vectorstore/

    • Processes incremental changes from Drasi queries
    • Manages sync points for exactly-once processing
    • Handles initial bootstrap and recovery scenarios
  • SDK Integration: Built on Drasi Reaction SDK for .NET

    • Leverages Microsoft.SemanticKernel v1.51.0+
    • Implements proper change handling (insert/update/delete)

Testing

  • Unit Tests:

    • Document processing with Handlebars templates
    • Vector store adapter operations
    • Embedding service integration
    • Sync point management
  • E2E Tests: Two comprehensive test suites

    • 07-sync-inmemory-vectorstore-scenario: Tests with InMemory store
    • 08-sync-qdrant-vectorstore-scenario: Tests with persistent Qdrant store
    • Validates insert, update, delete, and cascade operations
    • Verifies embedding generation and vector store synchronization

Internal Architecture

image

Configuration

Reaction Provider

apiVersion: v1
kind: ReactionProvider
name: SyncSemanticKernelVectorStore
spec:
  services:
    reaction:
      image: reaction-sync-semantickernel-vectorstore
  config_schema:
    type: object
    properties:
      # Vector Store Configuration
      vectorStoreType:
        type: string
        enum: ["Qdrant", "AzureAISearch", "InMemory"]
        description: "Type of vector store to sync with (currently supported stores)"
      connectionString:
        type: string
        description: "Connection string for the vector store (e.g., 'Endpoint=host:port;ApiKey=key' for Qdrant, 'Endpoint=https://...;ApiKey=key' for Azure AI Search)"
      
      # Embedding Service Configuration
      embeddingServiceType:
        type: string
        enum: ["AzureOpenAI", "OpenAI"]
        description: "Type of embedding service to use"
      embeddingEndpoint:
        type: string
        description: "Endpoint URL for the embedding service"
      embeddingApiKey:
        type: string
        description: "API key for the embedding service"
      embeddingModel:
        type: string
        description: "Model name/deployment name for embeddings"
        default: "text-embedding-3-large"
      embeddingDimensions:
        type: integer
        description: "Number of dimensions for embedding vectors"
        default: 3072
        minimum: 1
        maximum: 20000
      
      # Vector Store Collection Configuration
      distanceFunction:
        type: string
        enum: ["CosineSimilarity", "CosineDistance", "EuclideanDistance", "DotProductSimilarity", "ManhattanDistance"]
        description: "Distance function for vector similarity"
        default: "CosineSimilarity"
      indexKind:
        type: string
        enum: ["Hnsw", "Flat", "IvfFlat", "DiskAnn"]
        description: "Index type for vector search"
        default: "Hnsw"
      isFilterable:
        type: boolean
        description: "Enable filtering on data fields"
        default: true
      isFullTextSearchable:
        type: boolean
        description: "Enable full-text search on content fields (Azure AI Search only)"
        default: false
    required:
      - vectorStoreType
      - connectionString
      - embeddingServiceType
      - embeddingEndpoint
      - embeddingApiKey
      - embeddingModel

Example Reaction

apiVersion: v1
kind: Reaction
name: product-catalog-vectorstore
spec:
  kind: SyncSemanticKernelVectorStore
  
  # REACTION-LEVEL PROPERTIES (Infrastructure Configuration)
  properties:
    # Vector Store Configuration
    vectorStoreType: "AzureAISearch"
    connectionString: "Endpoint=https://your-search-service.search.windows.net;ApiKey=your-api-key"
    
    # Embedding Service Configuration  
    embeddingServiceType: "AzureOpenAI"
    embeddingEndpoint: "https://aman-eastus-resource.cognitiveservices.azure.com/"
    embeddingApiKey:
      kind: Secret
      name: azure-openai-creds
      key: api-key
    embeddingModel: "text-embedding-3-large"
    embeddingDimensions: 3072
    
    # Vector Store Collection Configuration (applies to all queries)
    distanceFunction: "CosineSimilarity"
    indexKind: "Hnsw"
    isFilterable: true
    isFullTextSearchable: true
    
  # QUERY-LEVEL PROPERTIES (Data Processing Configuration)
  queries:
    # Product catalog query - creates searchable product documents
    product-catalog: |
      {
        "collectionName": "products",
        "keyField": "product_id",
        "documentTemplate": "Product: {{name}}\nCategory: {{category}}\nDescription: {{description}}\nPrice: ${{price}}\nFeatures: {{features}}\nBrand: {{brand}}\nAvailability: {{availability_status}}",
        "titleTemplate": "{{brand}} - {{name}}",
        "vectorField": "content_vector",
        "createCollection": true
      }
    
    # Customer profile enrichment for personalization
    customer-profiles: |
      {
        "collectionName": "customers",
        "keyField": "customer_id",
        "documentTemplate": "Customer Profile:\nName: {{full_name}}\nSegment: {{segment}}\nInterests: {{interests}}\nPurchase History: {{purchase_summary}}\nPreferences: {{preferences}}\nLocation: {{city}}, {{country}}",
        "titleTemplate": "Customer: {{full_name}}",
        "vectorField": "profile_vector",
        "createCollection": true
      }
    
    # Support ticket knowledge base
    support-tickets: |
      {
        "collectionName": "support_kb",
        "keyField": "ticket_id",
        "documentTemplate": "Issue: {{issue_title}}\nCategory: {{category}}\nDescription: {{description}}\nResolution: {{resolution}}\nTags: {{tags}}",
        "titleTemplate": "{{category}}: {{issue_title}}",
        "vectorField": "issue_vector",
        "createCollection": true
      }

@amansinghoriginal amansinghoriginal requested a review from a team as a code owner September 3, 2025 00:08
@@ -0,0 +1,67 @@
apiVersion: v1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the reaction provider by included in the installer by default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

let signalrFixture;

// Helper function to get vector store documents from reaction pods
async function getInMemoryVectorStoreDocuments(reactionName, collectionName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is called getInMemoryVectorStoreDocuments, but it looks like it gets a pod name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The InMemory vector store doesn't expose an external API for verification.
Removed this function, and added comments explaining the limited validation we do here.
Longer term we can add an extra debug endpoint that allows e2e tests to validate the contents of the in-memory store.

@@ -0,0 +1,20 @@
FROM mcr.microsoft.com/dotnet/aspnet:9.0 AS base
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need the azure-linux and default variants. I think this changed after branching, the makefile will also need to be updated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

{
httpClient.Timeout = TimeSpan.FromSeconds(waitSeconds + ClientSideExtraTimeoutSeconds);

var requestUri = $"{_managementApiBaseUrl}/v1/continuousQueries/{queryId}/ready-wait?timeout={waitSeconds}";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update the SDK with this and use the SDK here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will need to make that as a separate change.

@amansinghoriginal amansinghoriginal force-pushed the feature/reaction-semantickernel-vectorstore branch from f9efbbc to 5d15bdb Compare September 5, 2025 00:42
@amansinghoriginal amansinghoriginal force-pushed the feature/reaction-semantickernel-vectorstore branch from d057ace to cbc81a3 Compare September 5, 2025 02:32
Signed-off-by: Aman Singh <[email protected]>
Signed-off-by: Aman Singh <[email protected]>
connectionString: "Endpoint=qdrant.default.svc.cluster.local:6334"
embeddingServiceType: AzureOpenAI
embeddingEndpoint: "https://aman-eastus-resource.cognitiveservices.azure.com/"
embeddingApiKey: "${AZURE_OPENAI_KEY}"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruokun-niu : Add this to github repo secrets

vectorStoreType: Qdrant
connectionString: "Endpoint=qdrant.default.svc.cluster.local:6334"
embeddingServiceType: AzureOpenAI
embeddingEndpoint: "https://aman-eastus-resource.cognitiveservices.azure.com/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should create one for Drasi instead of using a personal one?

@amansinghoriginal amansinghoriginal force-pushed the feature/reaction-semantickernel-vectorstore branch 2 times, most recently from e9cbd88 to f5e9c70 Compare September 9, 2025 06:09
@amansinghoriginal amansinghoriginal force-pushed the feature/reaction-semantickernel-vectorstore branch from f5e9c70 to bc245db Compare September 9, 2025 07:03
Copy link
Contributor

@danielgerlag danielgerlag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should rename this to simply VectorStore-Reaction and drop all the references to Semantic Kernel because it is only an internal implementation detail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants