Skip to content

What is the @search.score cutoff under which no results appear ? I am getting lesser number of rows than specified in the top parameter of .search #25319

Open
@gitprojects619

Description

@gitprojects619

I have created an azure search index with the below dataframe
df for search index

.
Scenario 1: search_client.search('stand-up',top=3) gives me all 3 rows from the index in the results,
but
Scenario 2: search_client.search('What do comics do?',top=3) only gives me 1 result. (Images at the end of the question)

My question : Why is the search method not returning all the 3 rows in my Scenario 2 in spite of me specifying top=3. Is there a threshold of @search.score that needs to be met for a row in order to be returned ? If yes, Can this threshold be controlled as a parameter in .search method?

I have already been through the method's source code and don't see any such parameter

.
Return for Scenario 1
Return for Scenario 1
.
Return for Scenario 2
enter image description here
.
.
Below is the full code to reproduce this issue

AZURE_SEARCH_SERVICE = 'to be filled as str'
AZURE_SEARCH_KEY = 'to be filled as str'


from azure.search.documents.indexes import SearchIndexClient
from azure.core.credentials import AzureKeyCredential
from azure.search.documents.indexes.models import *
from azure.search.documents import SearchClient
import pandas as pd
from uuid import uuid4
from azure.search.documents.models import QueryType, Vector

def create_search_index(index_name:str)->None:

    index_client = SearchIndexClient(endpoint=f"https://{AZURE_SEARCH_SERVICE}.search.windows.net/",
                                     credential=AzureKeyCredential(AZURE_SEARCH_KEY))

    index = SearchIndex(
        name=index_name,
        fields=[
            SimpleField(name="uuid", type="Edm.String", key=True),
            SimpleField(name="Numb_Str", type="Edm.String", filterable=True, facetable=True),
            SearchableField(name="Sent", type="Edm.String", analyzer_name="en.microsoft"),
            SimpleField(name="Topic", type="Edm.String", filterable=True, facetable=True),
        ],
        semantic_settings=SemanticSettings(
            configurations=[SemanticConfiguration(
                name='default',
                prioritized_fields=PrioritizedFields(
                    title_field=None, prioritized_content_fields=[SemanticField(field_name='Sent')]))])
    )
    print(f"Creating {index} search index")
    index_client.create_index(index)


def upload_to_created_index(index_name:str,df:pd.DataFrame)->None:

    search_client = SearchClient(endpoint=f"https://{AZURE_SEARCH_SERVICE}.search.windows.net/",
                                 index_name=index_name,
                                 credential=AzureKeyCredential(AZURE_SEARCH_KEY))
    sections = df.to_dict("records")
    search_client.upload_documents(documents=sections)


#create df for uploading to search index
data = [{'uuid':str(uuid4()),'Numb_Str':'10','Sent':'Stand-up comedy is a comedic performance to a live audience in which the performer addresses the audience directly from the stage','Topic':'Standup'},
        {'uuid':str(uuid4()),'Numb_Str':'20','Sent':'A stand-up defines their craft through the development of the routine or set','Topic':'Standup'},
        {'uuid':str(uuid4()),'Numb_Str':'30', 'Sent':'Experienced stand-up comics with a popular following may produce a special.','Topic':'Standup'}]

df = pd.DataFrame(data)
pd.set_option('display.max_colwidth', None)




#create empty search index
create_search_index("test-simple2")


#upload df to created search index
upload_to_created_index('test-simple2',df)

#query the search index
search_client = SearchClient(
            
            endpoint=f"https://{AZURE_SEARCH_SERVICE}.search.windows.net",
            index_name='test-simple2',
            credential=AzureKeyCredential(AZURE_SEARCH_KEY))

query_results = search_client.search('What do comics do?',top=3)
query_results = list(query_results)

#get query results in a df
df_results = pd.DataFrame(query_results)

df_results

.

If I try changing the .search method's args to make it do a semantic search , I still get 1 result. I do it with the below

query_results = search_client.search('What do comics do?',
                                     top=3,
                                     query_type=QueryType.SEMANTIC,
                                     query_language='en-us',
                                     semantic_configuration_name="default")

Metadata

Metadata

Assignees

No one assigned

    Labels

    SearchService AttentionWorkflow: This issue is responsible by Azure service team.customer-reportedIssues that are reported by GitHub users external to the Azure organization.needs-team-attentionWorkflow: This issue needs attention from Azure service team or SDK teamquestionThe issue doesn't require a change to the product in order to be resolved. Most issues start as that

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions