Need Debugging Help with errors of azure ai search

# Problem: Azure AI Search Returns Results for Garbage or Random Words. 

I just wanted to know what is the right way of using query rewrite or regular semantic hybrid search with non query rewriting on how I can automatically avoid lot of these results for really bad and no related words such as 'aaaa' or 's*x' or any such un related words.

I though of using the re-ranking score but even for word like 'xxxxxx' reranking score is greater than 2.5. If I use threshold like 2 then these results also pop up, if I use 2.5 as threshold then even for good search query lot of matching results are lost.

## Documents
I have 40 documents in the search index. Each document contains a **product title** and **description**.

## Queries When Results Are Not Expected
### Try 1: Using the Older Version of Azure AI Search Without Recent Query Rewrite
(Refer: [[Azure AI Search Query Rewrite Documentation](https://learn.microsoft.com/en-us/azure/search/semantic-how-to-query-rewrite)](https://learn.microsoft.com/en-us/azure/search/semantic-how-to-query-rewrite))

#### Scenarios Inside This Try
1. **Just Semantic Search**
2. **Semantic Hybrid Search** (Semantic + Vectorization)

---
### Case A: Just Semantic Search
- **Input**: 'aaaaaaaaaaaaaaaaa' or 'S*x' or 'random'
- **Code**:
```python
results = search_client.search(
    search_text=input_data,
    select=["experienceTitle", "experienceDescription"],
    semantic_configuration_name='barsv3',
    query_type="semantic",
    query_language="en-US",
    query_speller='lexicon',
    top=3
)
```
- **Output**: As expected, empty results.

---
### Case B: Semantic Hybrid Search
- **Input**: 'aaaaaaaaaaaaaaaaa' or 'S*x' or 'random'
- **Code**:
```python
vector_query = VectorizedQuery(
    vector=embedding,
    k_nearest_neighbors=50,
    exhaustive=True,
    fields="experienceDescriptionVector,experienceTitleVector"
)

search_client = SearchClient(
    endpoint=endpoint,
    index_name='bars-v3',
    credential=credential,
    api_version='2024-11-01-preview'
)

results = search_client.search(
    search_text=input_data,
    vector_queries=[vector_query],
    select=["experienceTitle", "experienceDescription"],
    semantic_configuration_name='barsv3',
    query_type="semantic",
    query_language="en-US",
    query_speller='lexicon',
    top=3
)
```
- **Output**: Not as expected. Results are returned even though they shouldn’t.
  - **Search Results for 'aaaaaaa':**
    ```json
    [
      {"productis": 0, "score": 0.0234118290245533, "reranker_score": 1.6579372882843018},
      {"productis": 1, "score": 0.026050420477986336, "reranker_score": 1.6370235681533813},
      {"productis": 2, "score": 0.025913622230291367, "reranker_score": 1.626389503479004},
      {"productis": 3, "score": 0.03205128386616707, "reranker_score": 1.618236780166626}
    ]
    ```

---
**Decision**: Use regular semantic search due to errors caused by Semantic Hybrid Search.

---

### Try 2: Newer Version of Azure AI Search Including Query Rewriting
(Refer: [[Azure AI Search Query Rewrite Documentation](https://learn.microsoft.com/en-us/azure/search/semantic-how-to-query-rewrite)](https://learn.microsoft.com/en-us/azure/search/semantic-how-to-query-rewrite))

#### Scenarios Inside This Try
1. **Just Semantic Search + Query Rewrite**
2. **Semantic Hybrid Search + Query Rewrite**

---
### Case A: Just Semantic Search + Query Rewrite
- **Input**: 'aaaaaaaaaaaaaaaaa' or 'S*x' or 'random'
- **Code**:
```python
results = search_client.search(
    search_text=input_data,
    select=["experienceTitle", "experienceDescription"],
    semantic_configuration_name='barsv3',
    query_type="semantic",
    query_language="en-US",
    query_speller='lexicon',
    query_rewrites="generative",
    debug="queryRewrites",
    top=4
)
```
- **Output**: Not as expected.
  - **Search Results for 'aaaaaaa':**
    ```json
    [
      "meaning of aaaaaaaa",
      "what does aaaaaaaa mean",
      "define aaaaaaa",
      "aaaaaaa meaning"
    ]
    [
      {"productis": 0, "score": 0.7754897, "reranker_score": 1.6579372882843018},
      {"productis": 1, "score": 0.27041504, "reranker_score": 1.6370235681533813},
      {"productis": 2, "score": 1.0258656, "reranker_score": 1.618236780166626},
      {"productis": 3, "score": 0.20604418, "reranker_score": 1.524656891822815}
    ]
    ```

---
### Case B: Semantic Hybrid Search + Query Rewrite
- **Input**: 'aaaaaaaaaaaaaaaaa' or 'S*x' or 'random'
- **Code**:
```python
results = search_client.search(
    search_text=input_data,
    select=["experienceTitle", "experienceDescription"],
    semantic_configuration_name='barsv3',
    query_type="semantic",
    query_language="en-US",
    query_speller='lexicon',
    query_rewrites="generative",
    debug="queryRewrites",
    top=4
)
```
- **Output**: Not as expected. Results returned despite nonsensical input.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Need Debugging Help with errors of azure ai search #288

Problem: Azure AI Search Returns Results for Garbage or Random Words.

Documents

Queries When Results Are Not Expected

Try 1: Using the Older Version of Azure AI Search Without Recent Query Rewrite

Scenarios Inside This Try

Case A: Just Semantic Search

Case B: Semantic Hybrid Search

Try 2: Newer Version of Azure AI Search Including Query Rewriting

Scenarios Inside This Try

Case A: Just Semantic Search + Query Rewrite

Case B: Semantic Hybrid Search + Query Rewrite

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Need Debugging Help with errors of azure ai search #288

Description

Problem: Azure AI Search Returns Results for Garbage or Random Words.

Documents

Queries When Results Are Not Expected

Try 1: Using the Older Version of Azure AI Search Without Recent Query Rewrite

Scenarios Inside This Try

Case A: Just Semantic Search

Case B: Semantic Hybrid Search

Try 2: Newer Version of Azure AI Search Including Query Rewriting

Scenarios Inside This Try

Case A: Just Semantic Search + Query Rewrite

Case B: Semantic Hybrid Search + Query Rewrite

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions