Skip to content

Implement BM25#134

Open
verdie-g wants to merge 2 commits intomilvus-io:mainfrom
verdie-g:bm25
Open

Implement BM25#134
verdie-g wants to merge 2 commits intomilvus-io:mainfrom
verdie-g:bm25

Conversation

@verdie-g
Copy link
Contributor

No description provided.

Signed-off-by: Grégoire Verdier <g.verdier@criteo.com>
@gemini-code-assist
Copy link

Summary of Changes

Hello @verdie-g, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Milvus .NET client by implementing full support for BM25 full-text search. It introduces new schema definitions for functions and fields to enable text analysis and sparse vector generation, alongside new API methods for performing BM25-specific searches and integrating them into hybrid search workflows. This allows users to perform advanced text-based retrieval operations directly through the client.

Highlights

  • BM25 Full-Text Search Support: Introduced comprehensive support for BM25 (Best Matching 25) full-text search, allowing users to leverage this ranking function for text-based queries within Milvus collections.
  • New FunctionSchema and FunctionType: Added FunctionSchema and FunctionType to define schema-level functions, enabling automatic data transformations like BM25 indexing where text fields are converted to sparse vectors.
  • Enhanced FieldSchema for Text Analysis: The FieldSchema for VARCHAR fields now includes enableAnalyzer and analyzerParams properties, crucial for configuring text fields as input for BM25 functions.
  • Dedicated BM25 Search Methods: Provided a new SearchAsync overload in MilvusCollection specifically for BM25 full-text queries, accepting raw text strings that are automatically processed by the defined BM25 function.
  • Hybrid Search Integration: Integrated BM25 search into hybrid search capabilities through the new TextAnnSearchRequest, allowing combination of BM25 full-text search with dense vector searches using rerankers like RRF or WeightedReranker.
  • Configurable Sparse Inverted Index: Updated IndexType.SparseInvertedIndex to support BM25-specific build parameters such as inverted_index_algo, bm25_k1, and bm25_b, offering fine-grained control over index behavior.
  • Extensive Test Coverage: Added a new test file Bm25Tests.cs with multiple test cases covering various BM25 scenarios, including basic full-text search, score verification, multiple queries, filtering, hybrid search with dense vectors, and custom index parameters.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

SimilarityMetricType.Bm25,
extraParams: new Dictionary<string, string>
{
["inverted_index_algo"] = "\"DAAT_WAND\"",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awkward quotes. @roji big breaking change but should extraParams be a Dictionary<string, object>?

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for BM25 full-text search, which is a great addition to the client's capabilities. The changes are well-structured, introducing new concepts like FunctionSchema and TextAnnSearchRequest and integrating them cleanly into the existing object model. The new APIs are clearly documented, and the comprehensive test suite ensures the functionality is working as expected. I have one minor suggestion to improve input validation for the new search method.

Comment on lines +296 to +297
Verify.NotNull(queries);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It's good that you're verifying that queries is not null. However, you should also check if the list is empty. Sending an empty list of queries to the server might lead to an unclear error or unexpected behavior. It's better to fail fast with a clear ArgumentException if no queries are provided, similar to how it's done in TextAnnSearchRequest.

        Verify.NotNull(queries);
        if (queries.Count == 0)
        {
            throw new ArgumentException("At least one query must be provided.", nameof(queries));
        }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not consistent with other search methods.

Signed-off-by: Grégoire Verdier <g.verdier@criteo.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant