Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 69 additions & 1 deletion cli/installers/resources/default-reaction-providers.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -275,4 +275,72 @@ spec:
type: number
required:
- baseUrl

---
apiVersion: v1
kind: ReactionProvider
name: SyncSemanticKernelVectorStore
spec:
services:
reaction:
image: reaction-sync-semantickernel-vectorstore
config_schema:
type: object
properties:
# Vector Store Configuration
vectorStoreType:
type: string
enum: ["Qdrant", "AzureAISearch", "InMemory"]
description: "Type of vector store to sync with (currently supported stores)"
connectionString:
type: string
description: "Connection string for the vector store (e.g., 'Endpoint=host:port;ApiKey=key' for Qdrant, 'Endpoint=https://...;ApiKey=key' for Azure AI Search)"

# Embedding Service Configuration
embeddingServiceType:
type: string
enum: ["AzureOpenAI"]
description: "Type of embedding service to use"
embeddingEndpoint:
type: string
description: "Endpoint URL for the embedding service"
embeddingApiKey:
type: string
description: "API key for the embedding service"
embeddingModel:
type: string
description: "Model name/deployment name for embeddings"
default: "text-embedding-3-large"
embeddingDimensions:
type: integer
description: "Number of dimensions for embedding vectors"
default: 3072
minimum: 1
maximum: 20000

# Vector Store Collection Configuration
distanceFunction:
type: string
enum: ["CosineSimilarity", "CosineDistance", "EuclideanDistance", "DotProductSimilarity", "ManhattanDistance"]
description: "Distance function for vector similarity"
default: "CosineSimilarity"
indexKind:
type: string
enum: ["Hnsw", "Flat", "IvfFlat", "DiskAnn"]
description: "Index type for vector search"
default: "Hnsw"
isFilterable:
type: boolean
description: "Enable filtering on data fields"
default: true
isFullTextSearchable:
type: boolean
description: "Enable full-text search on content fields (Azure AI Search only)"
default: false
required:
- vectorStoreType
- connectionString
- embeddingServiceType
- embeddingEndpoint
- embeddingApiKey
- embeddingModel
---
255 changes: 255 additions & 0 deletions e2e-tests/07-sync-inmemory-vectorstore-scenario/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,255 @@
# E2E Tests: InMemory Vector Store Synchronization

## Overview
This test suite validates the **Semantic Kernel Vector Store Reaction** (`reactions/semantickernel/sync-vectorstore`) using an **InMemory vector store** backend with **Azure OpenAI embeddings**.

The reaction continuously synchronizes Drasi query results to a vector store, generating embeddings for each result row using configurable Handlebars templates.

## What's Being Tested

### Core Reaction Flow
```
PostgreSQL → Drasi Queries → SK Vector Store Reaction → InMemory Vector Store
↓ ↓ ↓ ↓
Changes Query Results Generate Embeddings Store Documents
```

### Test Scenarios

#### 1. **Initial State Sync**
Verifies that existing database rows are:
- Detected by Drasi queries
- Processed to generate embeddings via Azure OpenAI
- Stored in the InMemory vector store with proper document structure

```javascript
// Verify 5 initial products are synced
expect(queryData).toHaveLength(5);
const productIds = queryData.map(item => item.id).sort();
expect(productIds).toEqual([1, 2, 3, 4, 5]);
```

#### 2. **Insert Operations**
Tests that new database inserts:
- Trigger real-time change detection
- Generate new embeddings
- Add documents to vector store

Includes both single and bulk insert scenarios:
```sql
-- Single insert
INSERT INTO products (id, name, description, category_id)
VALUES (101, 'Test Product', 'A test product for e2e testing', 1);

-- Bulk insert (5 products)
INSERT INTO products (id, name, description, category_id) VALUES
(102, 'Bulk Product 1', 'Bulk test product 1', 2),
(103, 'Bulk Product 2', 'Bulk test product 2', 2),
-- ... more products
```

#### 3. **Update Operations**
Validates that updates:
- Regenerate embeddings with new content
- Replace existing documents (same key, new embedding)
- Cascade through joined queries when related data changes

```javascript
// Simple update
await dbClient.query(
"UPDATE products SET description = $1 WHERE id = $2",
["Updated test product description", 101]
);

// Cascade update through join
await dbClient.query(
"UPDATE categories SET name = $1 WHERE id = $2",
["Updated Electronics", 1]
);
```

#### 4. **Delete Operations**
Ensures deletions:
- Remove documents from vector store
- Handle cascade deletes in join queries
- Maintain consistency between query results and vector store

## Test Configuration

### Queries Tested
1. **Simple Query** (`sk-products-query`): Single table query
```cypher
MATCH (p:products)
RETURN
p.id AS id,
p.name AS name,
p.description AS description,
p.category_id AS category_id
```

2. **Join Query** (`sk-products-with-category-query`): Multi-table join with explicit relationship
```cypher
MATCH (p:products)-[:HAS_CATEGORY]->(c:categories)
RETURN
p.id AS product_id,
p.name AS product_name,
p.description AS product_description,
c.name AS category_name,
c.description AS category_description
```

### Reaction Configuration
```yaml
apiVersion: v1
kind: Reaction
name: sk-inmemory-simple-reaction
spec:
kind: SyncSemanticKernelVectorStore
queries:
sk-products-query: |
{
"collectionName": "inmemory_products_simple",
"keyField": "id",
"documentTemplate": "Product: {{name}} - {{description}}",
"titleTemplate": "{{name}}",
"vectorField": "content_vector",
"createCollection": true
}
properties:
vectorStoreType: InMemory
connectionString: "" # Not required for InMemory
embeddingServiceType: AzureOpenAI
embeddingEndpoint: "https://aman-eastus-resource.cognitiveservices.azure.com/"
embeddingApiKey: "${AZURE_OPENAI_KEY}" # Set via environment variable
embeddingModel: "text-embedding-3-large"
embeddingDimensions: 3072
distanceFunction: CosineSimilarity
indexKind: Hnsw
isFilterable: true
```

### Key Components
- **Vector Store**: InMemory (non-persistent, for testing)
- **Embedding Service**: Azure OpenAI
- **Embedding Model**: text-embedding-3-large (3072 dimensions)
- **Template Engine**: Handlebars for document and title generation
- **Database**: PostgreSQL with logical replication enabled
- **Distance Function**: Cosine Similarity
- **Index Type**: HNSW (Hierarchical Navigable Small World)

## Test Data

### Initial Data
```sql
-- 3 categories
INSERT INTO categories (id, name, description) VALUES
(1, 'Electronics', 'Electronic devices and accessories'),
(2, 'Books', 'Books and publications'),
(3, 'Clothing', 'Apparel and accessories');

-- 5 products
INSERT INTO products (id, name, description, category_id) VALUES
(1, 'Laptop Pro', 'High-performance laptop with 16GB RAM and 512GB SSD', 1),
(2, 'Wireless Mouse', 'Ergonomic wireless mouse with precision tracking', 1),
(3, 'Data Science Handbook', 'Comprehensive guide to data science and machine learning', 2),
(4, 'Python Programming', 'Learn Python programming from basics to advanced', 2),
(5, 'Cotton T-Shirt', 'Comfortable cotton t-shirt in multiple colors', 3);
```

### Test-Specific Data
- IDs 101-106: Used for insert/update/delete tests
- All test data with ID > 100 is cleaned up after tests complete
- Categories are reset to original state after cascade tests

## Environment Variables

### Required
All three environment variables must be set before running the tests:

- `AZURE_OPENAI_KEY`: API key for Azure OpenAI service
- `AZURE_OPENAI_ENDPOINT`: Azure OpenAI endpoint URL (e.g., `https://your-resource.openai.azure.com/`)
- `AZURE_OPENAI_MODEL`: Embedding model deployment name (e.g., `text-embedding-3-large`)

## Verification Methods

Since InMemory store doesn't expose a query API, verification happens through:

1. **SignalR Monitoring** - Tracks query result changes as proxy for vector store state
```javascript
const queryData = await signalrFixture.requestReload("sk-products-query");
```

2. **Reaction Log Analysis** - Checks for embedding generation patterns:
- "Generated embedding(s) for N text"
- "Generating embeddings for N document"
- "Processing N documents for vectorization"
- "Created N embeddings"
- "Batch embedding generation: N items"

3. **Query State Validation** - Ensures Drasi queries reflect expected state after operations

## Running the Tests

### Prerequisites
```bash
# Required: Set all Azure OpenAI environment variables
export AZURE_OPENAI_KEY="your-api-key"
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
export AZURE_OPENAI_MODEL="text-embedding-3-large"

# Ensure Docker is running and kind cluster is available
docker info
kind get clusters
```

### Execute Tests
```bash
# From repository root
cd e2e-tests

# Run this specific test suite
npm test -- 07-sync-inmemory-vectorstore-scenario/inmemory-vectorstore.test.js

# Or run with Jest directly
npx jest 07-sync-inmemory-vectorstore-scenario/inmemory-vectorstore.test.js
```

## Expected Behavior

### ✅ Success Indicators:
- All 5 initial products generate embeddings
- Insert operations add documents within 30 seconds
- Update operations regenerate embeddings with new content
- Delete operations remove documents from vector store
- Join query updates cascade correctly when related data changes
- No duplicate documents created (key field ensures uniqueness)
- Resource cleanup completes successfully

### ❌ Common Failures:
- **Missing environment variables**: All three Azure OpenAI variables are required: `AZURE_OPENAI_KEY`, `AZURE_OPENAI_ENDPOINT`, `AZURE_OPENAI_MODEL`
- **Invalid API key**: Verify key has access to the specified endpoint and model
- **Wrong endpoint format**: Ensure endpoint includes `https://` and trailing `/`
- **Model not deployed**: Verify the model name matches your Azure OpenAI deployment
- **Timeout errors**: Initial sync may take up to 30 seconds due to embedding generation
- **Port conflicts**: Ensure port 5432 isn't in use by local PostgreSQL
- **Resource cleanup issues**: May leave resources if test fails; manually clean with `kubectl delete`

## Architecture Notes

### InMemory Vector Store Characteristics:
- **Non-persistent**: Data stored in process memory only
- **No external dependencies**: Runs entirely within the reaction pod
- **Restart behavior**: All data lost on reaction restart
- **Use cases**: Testing, development, and temporary scenarios
- **Performance**: Fast operations, no network overhead

### Semantic Kernel Integration:
- Uses Microsoft.SemanticKernel.Connectors.InMemory
- Supports full vector operations (add, update, delete, search)
- Implements IVectorStore interface for consistency across stores

### Sync Point Management:
- Tracks last processed sequence number per query
- Ensures exactly-once processing semantics
- Stored in a special collection within the vector store
- Retention period: 30 days (configurable via `syncPointRetentionDays`)
Loading
Loading