Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.Net: Vector store abstractions hybrid search ADR #10196

Closed
wants to merge 56 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
f19e88e
Add initial hybrid ADR doc.
westey-m Dec 3, 2024
531d220
Add sparse data model row.
westey-m Dec 3, 2024
5819cfb
Add cosmosdb nosql and mongo db to comparison. Add FusionMethod optio…
westey-m Jan 15, 2025
26eaa82
Fix typo
westey-m Jan 15, 2025
e6e1819
Fix more typos.
westey-m Jan 15, 2025
174cbd1
TF-IDF .net link
westey-m Jan 15, 2025
bab8740
Add another decision to adr and improve formatting.
westey-m Jan 17, 2025
5f119ce
Add more keyword param options
westey-m Jan 20, 2025
c56ffb7
Merge branch 'main' into vector-store-hybrid-adr
westey-m Jan 20, 2025
165cac1
Add Azure AI Search implementation and common keyword hybrid tests.
westey-m Jan 20, 2025
92853ee
Merge branch 'main' into vector-store-hybrid-adr
westey-m Jan 20, 2025
ac95190
Add ability to choose text property for hybrid azure ai search.
westey-m Jan 20, 2025
fee9651
Fix namespace issue.
westey-m Jan 22, 2025
3d58e63
Merge branch 'main' into vector-store-hybrid-adr
westey-m Jan 22, 2025
a68afec
Merge branch 'main' into vector-store-hybrid-adr
westey-m Jan 23, 2025
a920489
Update ADR with suggestions from pr and other improvements.
westey-m Jan 23, 2025
c153aa9
Add options around index required params.
westey-m Jan 23, 2025
cb358fb
Add support for azure cosmos db nosql hybrid search, without configur…
westey-m Jan 24, 2025
589528a
Fix typos.
westey-m Jan 24, 2025
f85e098
Fix typo
westey-m Jan 24, 2025
c774af9
Add a comparison of keyword matching behaviors between different DBs.
westey-m Jan 28, 2025
585405a
Add a qdrant hybrid search implementation
westey-m Jan 28, 2025
f7a65bc
Fix typo
westey-m Jan 28, 2025
c09e36a
Clarify "either" option further and fix typo.
westey-m Jan 29, 2025
a2b38d4
Add weaviate hybrid search implementation
westey-m Jan 31, 2025
424806a
Update ADR with postgres info
westey-m Jan 31, 2025
f567ebb
Update ADR with more info on keyword splitting.
westey-m Jan 31, 2025
0cc7494
Fix weaviate hybrid search score bug.
westey-m Jan 31, 2025
5fa5a24
Merge branch 'main' into vector-store-hybrid-adr
westey-m Feb 4, 2025
5b49efe
Fix weaviate unit test
westey-m Feb 4, 2025
8712882
Adding collection of keywords hybrid search overload
westey-m Feb 4, 2025
c790be5
Add mongo db hybrid search.
westey-m Feb 5, 2025
4f40559
Fix test
westey-m Feb 5, 2025
bccfddf
Switch cosmosdb nosql away from keyword parameter since it's not yet …
westey-m Feb 5, 2025
9b4b35d
Fix cosmos db nosql unit test
westey-m Feb 5, 2025
b541093
Change description param to searchtext as per pr suugestion
westey-m Feb 5, 2025
49e3cbf
Remove dense from naming, and remove single keyword hybrid search ove…
westey-m Feb 6, 2025
da99b31
Add updates from ADR Review to document.
westey-m Feb 6, 2025
b05c0ff
Update text property option to also throw if mulitple matches exist.
westey-m Feb 6, 2025
3e2c06f
Change hybrid search tests for azure ai search to use machine name po…
westey-m Feb 6, 2025
4244822
Remove FusionMethod until required, since only two DBs that we suppor…
westey-m Feb 6, 2025
46a97c7
Remove fusion method from remaining ADR locations.
westey-m Feb 6, 2025
547acd5
Rename TextPropertyName to FullTextPropertyName to indicate that it m…
westey-m Feb 6, 2025
4114ac0
Merge branch 'main' into vector-store-hybrid-adr
westey-m Feb 6, 2025
e266bdb
Add attributes to exclude azure ai search integration tests automatic…
westey-m Feb 6, 2025
11543b3
Address pr comments.
westey-m Feb 7, 2025
dc52964
Update xml doc for hybrid search method.
westey-m Feb 11, 2025
8c00934
Improve xml doc comment
westey-m Feb 11, 2025
722e9fb
Update ADR with more naming options.
westey-m Feb 11, 2025
73cfc20
Update Qdrant and Weaviate for PR
westey-m Feb 12, 2025
20d24b1
Update CosmosDBNoSQL for pr.
westey-m Feb 13, 2025
99ffa72
Add mongo db pre-pr fixes.
westey-m Feb 13, 2025
ab7b07d
Fixes for NoSql PR
westey-m Feb 17, 2025
3d5eeb3
Add azure cosmosdb MongoDB hybrid keyword search support. Missing the…
westey-m Feb 18, 2025
1a8064b
Switch to shared method for getting vector property.
westey-m Mar 4, 2025
4ef92d4
Rename hybrid search. Throw if vector multiple vector props and name …
westey-m Mar 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
395 changes: 395 additions & 0 deletions docs/decisions/00NN-hybrid-search.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion dotnet/Directory.Packages.props
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@
<PackageVersion Include="YamlDotNet" Version="15.3.0" />
<PackageVersion Include="Fluid.Core" Version="2.11.1" />
<!-- Memory stores -->
<PackageVersion Include="Microsoft.Azure.Cosmos" Version="3.45.2" />
<PackageVersion Include="Microsoft.Azure.Cosmos" Version="3.48.0-preview.0" />
<PackageVersion Include="Pgvector" Version="0.2.0" />
<PackageVersion Include="NRedisStack" Version="0.12.0" />
<PackageVersion Include="Milvus.Client" Version="2.3.0-preview.1" />
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -568,8 +568,6 @@ public async Task VectorizedSearchThrowsExceptionWithInvalidVectorTypeAsync(obje
}

[Theory]
[InlineData(null, "TestEmbedding1", 1, 1)]
[InlineData("", "TestEmbedding1", 2, 2)]
[InlineData("TestEmbedding1", "TestEmbedding1", 3, 3)]
[InlineData("TestEmbedding2", "test_embedding_2", 4, 4)]
public async Task VectorizedSearchUsesValidQueryAsync(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,16 +35,18 @@ public void BuildSearchQueryByDefaultReturnsValidQueryDefinition()
.EqualTo("TestProperty2", "test-value-2")
.AnyTagEqualTo("TestProperty3", "test-value-3");

var searchOptions = new VectorSearchOptions { Filter = filter, Skip = 5, Top = 10 };

// Act
var queryDefinition = AzureCosmosDBNoSQLVectorStoreCollectionQueryBuilder.BuildSearchQuery(
vector,
null,
fields,
this._storagePropertyNames,
vectorPropertyName,
null,
ScorePropertyName,
searchOptions);
filter,
10,
5);

var queryText = queryDefinition.QueryText;
var queryParameters = queryDefinition.GetQueryParameters();
Expand All @@ -54,22 +56,16 @@ public void BuildSearchQueryByDefaultReturnsValidQueryDefinition()
Assert.Contains("FROM x", queryText);
Assert.Contains("WHERE x.test_property_2 = @cv0 AND ARRAY_CONTAINS(x.test_property_3, @cv1)", queryText);
Assert.Contains("ORDER BY VectorDistance(x.test_property_1, @vector)", queryText);
Assert.Contains("OFFSET @offset LIMIT @limit", queryText);
Assert.Contains("OFFSET 5 LIMIT 10", queryText);

Assert.Equal("@vector", queryParameters[0].Name);
Assert.Equal(vector, queryParameters[0].Value);

Assert.Equal("@offset", queryParameters[1].Name);
Assert.Equal(5, queryParameters[1].Value);

Assert.Equal("@limit", queryParameters[2].Name);
Assert.Equal(10, queryParameters[2].Value);

Assert.Equal("@cv0", queryParameters[3].Name);
Assert.Equal("test-value-2", queryParameters[3].Value);
Assert.Equal("@cv0", queryParameters[1].Name);
Assert.Equal("test-value-2", queryParameters[1].Value);

Assert.Equal("@cv1", queryParameters[4].Name);
Assert.Equal("test-value-3", queryParameters[4].Value);
Assert.Equal("@cv1", queryParameters[2].Name);
Assert.Equal("test-value-3", queryParameters[2].Value);
}

[Fact]
Expand All @@ -84,39 +80,38 @@ public void BuildSearchQueryWithoutOffsetReturnsQueryDefinitionWithTopParameter(
.EqualTo("TestProperty2", "test-value-2")
.AnyTagEqualTo("TestProperty3", "test-value-3");

var searchOptions = new VectorSearchOptions { Filter = filter, Top = 10 };

// Act
var queryDefinition = AzureCosmosDBNoSQLVectorStoreCollectionQueryBuilder.BuildSearchQuery(
vector,
null,
fields,
this._storagePropertyNames,
vectorPropertyName,
null,
ScorePropertyName,
searchOptions);
filter,
10,
0);

var queryText = queryDefinition.QueryText;
var queryParameters = queryDefinition.GetQueryParameters();

// Assert
Assert.Contains("SELECT TOP @top x.test_property_1,x.test_property_2,x.test_property_3,VectorDistance(x.test_property_1, @vector) AS TestScore", queryText);
Assert.Contains("SELECT TOP 10 x.test_property_1,x.test_property_2,x.test_property_3,VectorDistance(x.test_property_1, @vector) AS TestScore", queryText);
Assert.Contains("FROM x", queryText);
Assert.Contains("WHERE x.test_property_2 = @cv0 AND ARRAY_CONTAINS(x.test_property_3, @cv1)", queryText);
Assert.Contains("ORDER BY VectorDistance(x.test_property_1, @vector)", queryText);

Assert.DoesNotContain("OFFSET @offset LIMIT @limit", queryText);
Assert.DoesNotContain("OFFSET 0 LIMIT 10", queryText);

Assert.Equal("@vector", queryParameters[0].Name);
Assert.Equal(vector, queryParameters[0].Value);

Assert.Equal("@top", queryParameters[1].Name);
Assert.Equal(10, queryParameters[1].Value);

Assert.Equal("@cv0", queryParameters[2].Name);
Assert.Equal("test-value-2", queryParameters[2].Value);
Assert.Equal("@cv0", queryParameters[1].Name);
Assert.Equal("test-value-2", queryParameters[1].Value);

Assert.Equal("@cv1", queryParameters[3].Name);
Assert.Equal("test-value-3", queryParameters[3].Value);
Assert.Equal("@cv1", queryParameters[2].Name);
Assert.Equal("test-value-3", queryParameters[2].Value);
}

[Fact]
Expand All @@ -129,17 +124,19 @@ public void BuildSearchQueryWithInvalidFilterThrowsException()

var filter = new VectorSearchFilter().EqualTo("non-existent-property", "test-value-2");

var searchOptions = new VectorSearchOptions { Filter = filter, Skip = 5, Top = 10 };

// Act & Assert
Assert.Throws<InvalidOperationException>(() =>
AzureCosmosDBNoSQLVectorStoreCollectionQueryBuilder.BuildSearchQuery(
vector,
null,
fields,
this._storagePropertyNames,
vectorPropertyName,
null,
ScorePropertyName,
searchOptions));
filter,
10,
5));
}

[Fact]
Expand All @@ -150,31 +147,28 @@ public void BuildSearchQueryWithoutFilterDoesNotContainWhereClause()
var vectorPropertyName = "test_property_1";
var fields = this._storagePropertyNames.Values.ToList();

var searchOptions = new VectorSearchOptions { Skip = 5, Top = 10 };

// Act
var queryDefinition = AzureCosmosDBNoSQLVectorStoreCollectionQueryBuilder.BuildSearchQuery(
vector,
null,
fields,
this._storagePropertyNames,
vectorPropertyName,
null,
ScorePropertyName,
searchOptions);
null,
10,
5);

var queryText = queryDefinition.QueryText;
var queryParameters = queryDefinition.GetQueryParameters();

// Assert
Assert.DoesNotContain("WHERE", queryText);
Assert.Contains("OFFSET 5 LIMIT 10", queryText);

Assert.Equal("@vector", queryParameters[0].Name);
Assert.Equal(vector, queryParameters[0].Value);

Assert.Equal("@offset", queryParameters[1].Name);
Assert.Equal(5, queryParameters[1].Value);

Assert.Equal("@limit", queryParameters[2].Name);
Assert.Equal(10, queryParameters[2].Value);
}

[Fact]
Expand Down Expand Up @@ -211,4 +205,51 @@ public void BuildSelectQueryByDefaultReturnsValidQueryDefinition()
Assert.Equal("@pk0", queryParameters[1].Name);
Assert.Equal("partition_key", queryParameters[1].Value);
}

[Fact]
public void BuildSearchQueryWithHybridFieldsReturnsValidHybridQueryDefinition()
{
// Arrange
var vector = new ReadOnlyMemory<float>([1f, 2f, 3f]);
var keywordText = "hybrid";
var vectorPropertyName = "test_property_1";
var textPropertyName = "test_property_2";
var fields = this._storagePropertyNames.Values.ToList();

var filter = new VectorSearchFilter()
.EqualTo("TestProperty2", "test-value-2")
.AnyTagEqualTo("TestProperty3", "test-value-3");

// Act
var queryDefinition = AzureCosmosDBNoSQLVectorStoreCollectionQueryBuilder.BuildSearchQuery(
vector,
[keywordText],
fields,
this._storagePropertyNames,
vectorPropertyName,
textPropertyName,
ScorePropertyName,
filter,
10,
5);

var queryText = queryDefinition.QueryText;
var queryParameters = queryDefinition.GetQueryParameters();

// Assert
Assert.Contains("SELECT x.test_property_1,x.test_property_2,x.test_property_3,VectorDistance(x.test_property_1, @vector) AS TestScore", queryText);
Assert.Contains("FROM x", queryText);
Assert.Contains("WHERE x.test_property_2 = @cv0 AND ARRAY_CONTAINS(x.test_property_3, @cv1)", queryText);
Assert.Contains("ORDER BY RANK RRF(VectorDistance(x.test_property_1, @vector), FullTextScore(x.test_property_2, [\"hybrid\"]))", queryText);
Assert.Contains("OFFSET 5 LIMIT 10", queryText);

Assert.Equal("@vector", queryParameters[0].Name);
Assert.Equal(vector, queryParameters[0].Value);

Assert.Equal("@cv0", queryParameters[1].Name);
Assert.Equal("test-value-2", queryParameters[1].Value);

Assert.Equal("@cv1", queryParameters[2].Name);
Assert.Equal("test-value-3", queryParameters[2].Value);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,10 @@ namespace Microsoft.SemanticKernel.Connectors.AzureAISearch;
/// </summary>
/// <typeparam name="TRecord">The data model to use for adding, updating and retrieving data from storage.</typeparam>
#pragma warning disable CA1711 // Identifiers should not have incorrect suffix
public sealed class AzureAISearchVectorStoreRecordCollection<TRecord> : IVectorStoreRecordCollection<string, TRecord>, IVectorizableTextSearch<TRecord>
public sealed class AzureAISearchVectorStoreRecordCollection<TRecord> :
IVectorStoreRecordCollection<string, TRecord>,
IVectorizableTextSearch<TRecord>,
IKeywordHybridSearch<TRecord>
#pragma warning restore CA1711 // Identifiers should not have incorrect suffix
{
/// <summary>The name of this database for telemetry purposes.</summary>
Expand Down Expand Up @@ -68,6 +71,9 @@ public sealed class AzureAISearchVectorStoreRecordCollection<TRecord> : IVectorS
/// <summary>The default options for vector search.</summary>
private static readonly VectorData.VectorSearchOptions s_defaultVectorSearchOptions = new();

/// <summary>The default options for hybrid vector search.</summary>
private static readonly HybridSearchOptions s_defaultKeywordVectorizedHybridSearchOptions = new();

/// <summary>Azure AI Search client that can be used to manage the list of indices in an Azure AI Search Service.</summary>
private readonly SearchIndexClient _searchIndexClient;

Expand Down Expand Up @@ -316,25 +322,16 @@ public async IAsyncEnumerable<string> UpsertBatchAsync(IEnumerable<TRecord> reco
/// <inheritdoc />
public Task<VectorSearchResults<TRecord>> VectorizedSearchAsync<TVector>(TVector vector, VectorData.VectorSearchOptions? options = null, CancellationToken cancellationToken = default)
{
Verify.NotNull(vector);

if (this._propertyReader.FirstVectorPropertyName is null)
{
throw new InvalidOperationException("The collection does not have any vector fields, so vector search is not possible.");
}

if (vector is not ReadOnlyMemory<float> floatVector)
{
throw new NotSupportedException($"The provided vector type {vector.GetType().FullName} is not supported by the Azure AI Search connector.");
}
var floatVector = VerifyVectorParam(vector);

// Resolve options.
var internalOptions = options ?? s_defaultVectorSearchOptions;
string? vectorFieldName = this.ResolveVectorFieldName(internalOptions.VectorPropertyName);
var vectorProperty = this._propertyReader.GetVectorPropertyOrSingle(internalOptions.VectorPropertyName);
var vectorPropertyName = this._propertyReader.GetJsonPropertyName(vectorProperty!.DataModelPropertyName);

// Configure search settings.
var vectorQueries = new List<VectorQuery>();
vectorQueries.Add(new VectorizedQuery(floatVector) { KNearestNeighborsCount = internalOptions.Top, Fields = { vectorFieldName } });
vectorQueries.Add(new VectorizedQuery(floatVector) { KNearestNeighborsCount = internalOptions.Top, Fields = { vectorPropertyName } });
var filterString = AzureAISearchVectorStoreCollectionSearchMapping.BuildFilterString(internalOptions.Filter, this._propertyReader.JsonPropertyNamesMap);

// Build search options.
Expand Down Expand Up @@ -370,11 +367,12 @@ public Task<VectorSearchResults<TRecord>> VectorizableTextSearchAsync(string sea

// Resolve options.
var internalOptions = options ?? s_defaultVectorSearchOptions;
string? vectorFieldName = this.ResolveVectorFieldName(internalOptions.VectorPropertyName);
var vectorProperty = this._propertyReader.GetVectorPropertyOrSingle(internalOptions.VectorPropertyName);
var vectorPropertyName = this._propertyReader.GetJsonPropertyName(vectorProperty!.DataModelPropertyName);

// Configure search settings.
var vectorQueries = new List<VectorQuery>();
vectorQueries.Add(new VectorizableTextQuery(searchText) { KNearestNeighborsCount = internalOptions.Top, Fields = { vectorFieldName } });
vectorQueries.Add(new VectorizableTextQuery(searchText) { KNearestNeighborsCount = internalOptions.Top, Fields = { vectorPropertyName } });
var filterString = AzureAISearchVectorStoreCollectionSearchMapping.BuildFilterString(internalOptions.Filter, this._propertyReader.JsonPropertyNamesMap);

// Build search options.
Expand All @@ -398,6 +396,48 @@ public Task<VectorSearchResults<TRecord>> VectorizableTextSearchAsync(string sea
return this.SearchAndMapToDataModelAsync(null, searchOptions, internalOptions.IncludeVectors, cancellationToken);
}

/// <inheritdoc />
public Task<VectorSearchResults<TRecord>> HybridSearchAsync<TVector>(TVector vector, ICollection<string> keywords, HybridSearchOptions? options = null, CancellationToken cancellationToken = default)
{
Verify.NotNull(keywords);
var floatVector = VerifyVectorParam(vector);

// Resolve options.
var internalOptions = options ?? s_defaultKeywordVectorizedHybridSearchOptions;
var vectorProperty = this._propertyReader.GetVectorPropertyOrSingle(internalOptions.VectorPropertyName);
var vectorPropertyName = this._propertyReader.GetJsonPropertyName(vectorProperty.DataModelPropertyName);
var textDataProperty = this._propertyReader.GetFullTextDataPropertyOrSingle(internalOptions.AdditionalPropertyName);
var textDataPropertyName = this._propertyReader.GetJsonPropertyName(textDataProperty.DataModelPropertyName);

// Configure search settings.
var vectorQueries = new List<VectorQuery>();
vectorQueries.Add(new VectorizedQuery(floatVector) { KNearestNeighborsCount = internalOptions.Top, Fields = { vectorPropertyName } });
var filterString = AzureAISearchVectorStoreCollectionSearchMapping.BuildFilterString(internalOptions.Filter, this._propertyReader.JsonPropertyNamesMap);

// Build search options.
var searchOptions = new SearchOptions
{
VectorSearch = new(),
Size = internalOptions.Top,
Skip = internalOptions.Skip,
Filter = filterString,
IncludeTotalCount = internalOptions.IncludeTotalCount,
};
searchOptions.VectorSearch.Queries.AddRange(vectorQueries);
searchOptions.SearchFields.Add(textDataPropertyName);

// Filter out vector fields if requested.
if (!internalOptions.IncludeVectors)
{
searchOptions.Select.Add(this._propertyReader.KeyPropertyJsonName);
searchOptions.Select.AddRange(this._propertyReader.DataPropertyJsonNames);
}

var keywordsCombined = string.Join(" ", keywords);

return this.SearchAndMapToDataModelAsync(keywordsCombined, searchOptions, internalOptions.IncludeVectors, cancellationToken);
}

/// <summary>
/// Get the document with the given key and map it to the data model using the configured mapper type.
/// </summary>
Expand Down Expand Up @@ -556,31 +596,6 @@ private GetDocumentOptions ConvertGetDocumentOptions(GetRecordOptions? options)
return innerOptions;
}

/// <summary>
/// Resolve the vector field name to use for a search by using the storage name for the field name from options
/// if available, and falling back to the first vector field name if not.
/// </summary>
/// <param name="optionsVectorFieldName">The vector field name provided via options.</param>
/// <returns>The resolved vector field name.</returns>
/// <exception cref="InvalidOperationException">Thrown if the provided field name is not a valid field name.</exception>
private string ResolveVectorFieldName(string? optionsVectorFieldName)
{
string? vectorFieldName;
if (!string.IsNullOrWhiteSpace(optionsVectorFieldName))
{
if (!this._propertyReader.JsonPropertyNamesMap.TryGetValue(optionsVectorFieldName!, out vectorFieldName))
{
throw new InvalidOperationException($"The collection does not have a vector field named '{optionsVectorFieldName}'.");
}
}
else
{
vectorFieldName = this._propertyReader.FirstVectorPropertyJsonName;
}

return vectorFieldName!;
}

/// <summary>
/// Get a document with the given key, and return null if it is not found.
/// </summary>
Expand Down Expand Up @@ -638,4 +653,16 @@ private async Task<T> RunOperationAsync<T>(string operationName, Func<Task<T>> o
};
}
}

private static ReadOnlyMemory<float> VerifyVectorParam<TVector>(TVector vector)
{
Verify.NotNull(vector);

if (vector is not ReadOnlyMemory<float> floatVector)
{
throw new NotSupportedException($"The provided vector type {vector.GetType().FullName} is not supported by the Azure AI Search connector.");
}

return floatVector;
}
}
Loading