Skip to content

Update Default value of Oversample for bbq #127134

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

Samiul-TheSoccerFan
Copy link
Contributor

@Samiul-TheSoccerFan Samiul-TheSoccerFan commented Apr 21, 2025

During index mapping when a dense_vector field is defined with bbq_* as index_options, the oversample value is now set to 3.0 by default.

bbq_hnsw::

PUT my-image-index
{
  "mappings": {
    "properties": {
       "image-vector": {
        "type": "dense_vector",
        "dims": 64,
        "index": true,
        "similarity": "l2_norm",
        "index_options": {
          "type": "bbq_hnsw"
        }
      },
      "file-type": {
        "type": "keyword"
      },
      "title": {
        "type": "text"
      }
    }
  }
}


response:::
{
  "my-image-index": {
    "mappings": {
      "properties": {
        "file-type": {
          "type": "keyword"
        },
        "image-vector": {
          "type": "dense_vector",
          "dims": 64,
          "index": true,
          "similarity": "l2_norm",
          "index_options": {
            "type": "bbq_hnsw",
            "m": 16,
            "ef_construction": 100,
            "rescore_vector": {
              "oversample": 3
            }
          }
        },
        "title": {
          "type": "text"
        }
      }
    }
  }
}

bbq_flat::

PUT my-image-index2
{
  "mappings": {
    "properties": {
       "image-vector": {
        "type": "dense_vector",
        "dims": 64,
        "index": true,
        "similarity": "l2_norm",
        "index_options": {
          "type": "bbq_flat"
        }
      },
      "file-type": {
        "type": "keyword"
      },
      "title": {
        "type": "text"
      }
    }
  }
}

response::
{
  "my-image-index2": {
    "mappings": {
      "properties": {
        "file-type": {
          "type": "keyword"
        },
        "image-vector": {
          "type": "dense_vector",
          "dims": 64,
          "index": true,
          "similarity": "l2_norm",
          "index_options": {
            "type": "bbq_flat",
            "rescore_vector": {
              "oversample": 3
            }
          }
        },
        "title": {
          "type": "text"
        }
      }
    }
  }
}

int8::

PUT my-image-index
{
  "mappings": {
    "properties": {
       "image-vector": {
        "type": "dense_vector",
        "dims": 3,
        "index": true,
        "similarity": "l2_norm"
      },
      "file-type": {
        "type": "keyword"
      },
      "title": {
        "type": "text"
      }
    }
  }
}

response::
{
  "my-image-index": {
    "mappings": {
      "properties": {
        "file-type": {
          "type": "keyword"
        },
        "image-vector": {
          "type": "dense_vector",
          "dims": 3,
          "index": true,
          "similarity": "l2_norm",
          "index_options": {
            "type": "int8_hnsw",
            "m": 16,
            "ef_construction": 100
          }
        },
        "title": {
          "type": "text"
        }
      }
    }
  }
}

Respect the provided value for bbq_*:

PUT my-image-index3
{
  "mappings": {
    "properties": {
       "image-vector": {
        "type": "dense_vector",
        "dims": 64,
        "index": true,
        "similarity": "l2_norm",
        "index_options": {
          "type": "bbq_hnsw",
          "rescore_vector": {"oversample": 2.0}
        }
      },
      "file-type": {
        "type": "keyword"
      },
      "title": {
        "type": "text"
      }
    }
  }
}

response
{
  "my-image-index3": {
    "mappings": {
      "properties": {
        "file-type": {
          "type": "keyword"
        },
        "image-vector": {
          "type": "dense_vector",
          "dims": 64,
          "index": true,
          "similarity": "l2_norm",
          "index_options": {
            "type": "bbq_hnsw",
            "m": 16,
            "ef_construction": 100,
            "rescore_vector": {
              "oversample": 2
            }
          }
        },
        "title": {
          "type": "text"
        }
      }
    }
  }
}

@Samiul-TheSoccerFan Samiul-TheSoccerFan force-pushed the update_default_oversample_for_bbq branch from e1acdbe to 70db95c Compare April 22, 2025 22:07
@Samiul-TheSoccerFan
Copy link
Contributor Author

@benwtrent @jimczi While I work on the yaml tests, can I get a quick feedback on the current changes?

@@ -1462,6 +1463,9 @@ public IndexOptions parseIndexOptions(String fieldName, Map<String, ?> indexOpti
RescoreVector rescoreVector = null;
if (indexVersion.onOrAfter(ADD_RESCORE_PARAMS_TO_QUANTIZED_VECTORS)) {
rescoreVector = RescoreVector.fromIndexOptions(indexOptionsMap, indexVersion);
if (rescoreVector == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should only happen on new indices. Please add a new index version to bar changing this on existing indices.

@elasticsearchmachine
Copy link
Collaborator

Hi @Samiul-TheSoccerFan, I've created a changelog YAML for you.

@Samiul-TheSoccerFan Samiul-TheSoccerFan marked this pull request as ready for review April 25, 2025 14:49
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Apr 25, 2025
@Samiul-TheSoccerFan
Copy link
Contributor Author

@benwtrent This is safe to merge or wait for @jimczi's review?

@benwtrent
Copy link
Member

We aren't in a hurry. Let's see what @jimczi says :)

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @Samiul-TheSoccerFan

@jimczi
Copy link
Contributor

jimczi commented Apr 28, 2025

Let's have a follow up to update the documentation @Samiul-TheSoccerFan ?

@Samiul-TheSoccerFan Samiul-TheSoccerFan merged commit cd4fcbf into elastic:main Apr 28, 2025
17 checks passed
@Samiul-TheSoccerFan
Copy link
Contributor Author

Added Documentation PR: elastic/docs-content#1290

benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request May 5, 2025
elasticsearchmachine pushed a commit that referenced this pull request May 5, 2025
This adds backport index versions in preparation for backporting
#127134
@benwtrent
Copy link
Member

💚 All backports created successfully

Status Branch Result
8.19

Questions ?

Please refer to the Backport tool documentation

benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request May 6, 2025
This adds backport index versions in preparation for backporting
elastic#127134
elasticsearchmachine pushed a commit that referenced this pull request May 6, 2025
* Update Default value of Oversample for bbq (#127134)

* Unit test to validate default behavior

* adding default value to oversample for bbq

* Fix code style issue

* Update docs/changelog/127134.yaml

* Update changelog

* Adding index version to support only new indices

* Update index version name to better match

* Adding a simple yaml test to verify the yaml functionality for oversample value

* Refactor knn float to add rescore vector by default when index type is one of bbq

* adding yaml tests to verify oversampel default value

* Fixing format issue for not_exists

(cherry picked from commit cd4fcbf)

* Adding backport index versions for PR #127134 (#127724)

This adds backport index versions in preparation for backporting
#127134

---------

Co-authored-by: Samiul Monir <[email protected]>
ywangd pushed a commit to ywangd/elasticsearch that referenced this pull request May 9, 2025
This adds backport index versions in preparation for backporting
elastic#127134
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport pending >enhancement :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants