Skip to content

Need Debugging Help with errors of azure ai search #288

Open
@jmandivarapu

Description

@jmandivarapu

Problem: Azure AI Search Returns Results for Garbage or Random Words.

I just wanted to know what is the right way of using query rewrite or regular semantic hybrid search with non query rewriting on how I can automatically avoid lot of these results for really bad and no related words such as 'aaaa' or 's*x' or any such un related words.

I though of using the re-ranking score but even for word like 'xxxxxx' reranking score is greater than 2.5. If I use threshold like 2 then these results also pop up, if I use 2.5 as threshold then even for good search query lot of matching results are lost.

Documents

I have 40 documents in the search index. Each document contains a product title and description.

Queries When Results Are Not Expected

Try 1: Using the Older Version of Azure AI Search Without Recent Query Rewrite

(Refer: [Azure AI Search Query Rewrite Documentation](https://learn.microsoft.com/en-us/azure/search/semantic-how-to-query-rewrite))

Scenarios Inside This Try

  1. Just Semantic Search
  2. Semantic Hybrid Search (Semantic + Vectorization)

Case A: Just Semantic Search

  • Input: 'aaaaaaaaaaaaaaaaa' or 'S*x' or 'random'
  • Code:
results = search_client.search(
    search_text=input_data,
    select=["experienceTitle", "experienceDescription"],
    semantic_configuration_name='barsv3',
    query_type="semantic",
    query_language="en-US",
    query_speller='lexicon',
    top=3
)
  • Output: As expected, empty results.

Case B: Semantic Hybrid Search

  • Input: 'aaaaaaaaaaaaaaaaa' or 'S*x' or 'random'
  • Code:
vector_query = VectorizedQuery(
    vector=embedding,
    k_nearest_neighbors=50,
    exhaustive=True,
    fields="experienceDescriptionVector,experienceTitleVector"
)

search_client = SearchClient(
    endpoint=endpoint,
    index_name='bars-v3',
    credential=credential,
    api_version='2024-11-01-preview'
)

results = search_client.search(
    search_text=input_data,
    vector_queries=[vector_query],
    select=["experienceTitle", "experienceDescription"],
    semantic_configuration_name='barsv3',
    query_type="semantic",
    query_language="en-US",
    query_speller='lexicon',
    top=3
)
  • Output: Not as expected. Results are returned even though they shouldn’t.
    • Search Results for 'aaaaaaa':
      [
        {"productis": 0, "score": 0.0234118290245533, "reranker_score": 1.6579372882843018},
        {"productis": 1, "score": 0.026050420477986336, "reranker_score": 1.6370235681533813},
        {"productis": 2, "score": 0.025913622230291367, "reranker_score": 1.626389503479004},
        {"productis": 3, "score": 0.03205128386616707, "reranker_score": 1.618236780166626}
      ]

Decision: Use regular semantic search due to errors caused by Semantic Hybrid Search.


Try 2: Newer Version of Azure AI Search Including Query Rewriting

(Refer: [Azure AI Search Query Rewrite Documentation](https://learn.microsoft.com/en-us/azure/search/semantic-how-to-query-rewrite))

Scenarios Inside This Try

  1. Just Semantic Search + Query Rewrite
  2. Semantic Hybrid Search + Query Rewrite

Case A: Just Semantic Search + Query Rewrite

  • Input: 'aaaaaaaaaaaaaaaaa' or 'S*x' or 'random'
  • Code:
results = search_client.search(
    search_text=input_data,
    select=["experienceTitle", "experienceDescription"],
    semantic_configuration_name='barsv3',
    query_type="semantic",
    query_language="en-US",
    query_speller='lexicon',
    query_rewrites="generative",
    debug="queryRewrites",
    top=4
)
  • Output: Not as expected.
    • Search Results for 'aaaaaaa':
      [
        "meaning of aaaaaaaa",
        "what does aaaaaaaa mean",
        "define aaaaaaa",
        "aaaaaaa meaning"
      ]
      [
        {"productis": 0, "score": 0.7754897, "reranker_score": 1.6579372882843018},
        {"productis": 1, "score": 0.27041504, "reranker_score": 1.6370235681533813},
        {"productis": 2, "score": 1.0258656, "reranker_score": 1.618236780166626},
        {"productis": 3, "score": 0.20604418, "reranker_score": 1.524656891822815}
      ]

Case B: Semantic Hybrid Search + Query Rewrite

  • Input: 'aaaaaaaaaaaaaaaaa' or 'S*x' or 'random'
  • Code:
results = search_client.search(
    search_text=input_data,
    select=["experienceTitle", "experienceDescription"],
    semantic_configuration_name='barsv3',
    query_type="semantic",
    query_language="en-US",
    query_speller='lexicon',
    query_rewrites="generative",
    debug="queryRewrites",
    top=4
)
  • Output: Not as expected. Results returned despite nonsensical input.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions