Skip to content

sltr queries with minimum_should_match features #476

Open
@jhinch-at-atlassian-com

Description

@jhinch-at-atlassian-com

One way to think about how an sltr query functions is that it is a bool query with custom scoring function.

For example given the following featureset definition:

{
  "featurset": {
    "features": [
      {
        "name": "title_text_match",
        "params": [
          "query_text"
        ],
        "template_language": "mustache",
        "template": {
          "match": {
            "title": "{{query_text}}"
          }
        }
      },
      {
        "name": "description_text_match",
        "params": [
          "query_text"
        ],
        "template_language": "mustache",
        "template": {
          "match": {
            "description": "{{query_text}}"
          }
        }
      },
      {
        "name": "description_knn_match",
        "params": [
          "query_embedding"
        ],
        "template_language": "mustache",
        "template": "{\"knn\":{\"field\":\"description_vector\",\"k\":10,\"query_vector\":{{#toJson}}query_embedding{{/toJson}}}}"
      }
    ]
  }
}

and a model example_model which was created using the above featureset, the following sltr query:

{
  "sltr": {
    "model": "example_model",
    "params": {
      "query_text": "the text query",
      "query_embedding": [1.0, 0.4, ...]
     }
  }
}

Can be thought conceptually as:

{
  "bool": {
    "filter": {
      "match_all": {}
    },
    "should": [
      {
        "match": {
          "title": "the text query"
        }
      },
      {
        "match": {
          "description": "the text query"
        }
      },
      {
        "knn": {
          "field": "description_vector",
          "k": 10,
          "query_vector": [1.0, 0.4, ...]
        }
      }
    ],
    "minimum_should_match": 0,
    // plus also use a special scoring function defined by example_model
  }
}

It would be great if the features used by the model could have a requirement of a minimum which should match so that the sltr:

{
  "sltr": {
    "model": "example_model",
    "params": {
      "query_text": "the text query",
      "query_embedding": [1.0, 0.4, ...]
     },
     "minimum_should_match": 1
  }
}

which would translates to roughly the following:

{
  "bool": {
    "should": [
      {
        "match": {
          "title": "the text query"
        }
      },
      {
        "match": {
          "description": "the text query"
        }
      },
      {
        "knn": {
          "field": "description_vector",
          "k": 10,
          "query_vector": [1.0, 0.4, ...]
        }
      }
    ],
    "minimum_should_match": 1,
    // plus also use a special scoring function defined by example_model
  }
}

This would make sltr queries more viable to use as part of the initial query and not need to be part of a rescore phase. The use case for this would be to use non-linear models (such as an LambdaMART model) as a means to deal with query clauses which have different scoring distributions which make them difficult to combined using a linear combination.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions