Skip to content

Commit

Permalink
Changing link to point to the main repo.
Browse files Browse the repository at this point in the history
  • Loading branch information
afoucret committed Feb 1, 2024
1 parent ee08f5b commit 229f604
Showing 1 changed file with 6 additions and 61 deletions.
67 changes: 6 additions & 61 deletions notebooks/search/08-learning-to-rank.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
"\n",
"TODO: udpate the link to elastic/elasticsearch-labs instead of my fork before merging.\n",
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/afoucret/elasticsearch-labs/blob/ltr-notebook/notebooks/search/08-learning-to-rank.ipynb)\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/ltr-notebook/notebooks/search/08-learning-to-rank.ipynb)\n",
"\n",
"In this notebook we will see an example on how to train a Learning To Rank model using [XGBoost](https://xgboost.ai/) and how to deploy it to be used as a rescorer in Elasticsearch.\n",
"\n",
Expand Down Expand Up @@ -136,9 +136,7 @@
"source": [
"from urllib.parse import urljoin\n",
"\n",
"# TODO: use elastic/elasticsearch-labs instead of afoucret/elasticsearch-labs before merging the PR.\n",
"\n",
"DATASET_BASE_URL = \"https://raw.githubusercontent.com/afoucret/elasticsearch-labs/ltr-notebook/notebooks/search/sample_data/learning-to-rank/\"\n",
"DATASET_BASE_URL = \"https://raw.githubusercontent.com/elastic/elasticsearch-labs/ltr-notebook/notebooks/search/sample_data/learning-to-rank/\"\n",
"\n",
"CORPUS_URL = urljoin(DATASET_BASE_URL, \"movies-corpus.jsonl.gz\")\n",
"JUDGEMENTS_FILE_URL = urljoin(DATASET_BASE_URL, \"movies-judgments.tsv.gz\")\n",
Expand Down Expand Up @@ -177,7 +175,7 @@
},
{
"cell_type": "code",
"execution_count": 33,
"execution_count": 8,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
Expand All @@ -192,7 +190,7 @@
"text": [
"Deleting index if it already exists: movies\n",
"Creating index: movies\n",
"Loading the corpus from https://raw.githubusercontent.com/afoucret/elasticsearch-labs/ltr-notebook/notebooks/search/sample_data/learning-to-rank/movies-corpus.jsonl.gz\n",
"Loading the corpus from https://raw.githubusercontent.com/elastic/elasticsearch-labs/ltr-notebook/notebooks/search/sample_data/learning-to-rank/movies-corpus.jsonl.gz\n",
"Indexing the corpus into movies ...\n",
"Indexed 9750 documents into movies\n"
]
Expand Down Expand Up @@ -1037,12 +1035,12 @@
"Once the model is uploaded to Elasticsearch, you will be able to use it as a rescorer in the _search API, as shown in this example:\n",
"\n",
"```\n",
"POST /_search\n",
"GET /movies/_search\n",
"{\n",
" \"query\" : {\n",
" \"multi_match\" : {\n",
" \"query\": \"star wars\",\n",
" \"field\": [\"title\", \"overview\", \"actors\", \"director\", \"tags\", \"characters\"]\n",
" \"fields\": [\"title\", \"overview\", \"actors\", \"director\", \"tags\", \"characters\"]\n",
" }\n",
" },\n",
" \"rescore\" : {\n",
Expand Down Expand Up @@ -1154,59 +1152,6 @@
"source": [
"We saw above that the title and popularity fields are important ranking feature in our model. Here we can see that now all results contain the query terms in the title. Moreover, more popular movies rank higher, for example `Star Wars: Episode I - The Phantom Menace` is now in third position."
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style>#sk-container-id-6 {color: black;}#sk-container-id-6 pre{padding: 0;}#sk-container-id-6 div.sk-toggleable {background-color: white;}#sk-container-id-6 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-6 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-6 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-6 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-6 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-6 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-6 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-6 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-6 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-6 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-6 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-6 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-6 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-6 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-6 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-6 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-6 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-6 div.sk-item {position: relative;z-index: 1;}#sk-container-id-6 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-6 div.sk-item::before, #sk-container-id-6 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-6 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-6 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-6 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-6 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-6 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-6 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-6 div.sk-label-container {text-align: center;}#sk-container-id-6 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-6 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-6\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>XGBRanker(base_score=None, booster=None, callbacks=None, colsample_bylevel=None,\n",
" colsample_bynode=None, colsample_bytree=None, device=None,\n",
" early_stopping_rounds=20, enable_categorical=False,\n",
" eval_metric=[&#x27;ndcg@10&#x27;], feature_types=None, gamma=None,\n",
" grow_policy=None, importance_type=None, interaction_constraints=None,\n",
" learning_rate=None, max_bin=None, max_cat_threshold=None,\n",
" max_cat_to_onehot=None, max_delta_step=None, max_depth=None,\n",
" max_leaves=None, min_child_weight=None, missing=nan,\n",
" monotone_constraints=None, multi_strategy=None, n_estimators=None,\n",
" n_jobs=None, num_parallel_tree=None, random_state=None, ...)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-6\" type=\"checkbox\" checked><label for=\"sk-estimator-id-6\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">XGBRanker</label><div class=\"sk-toggleable__content\"><pre>XGBRanker(base_score=None, booster=None, callbacks=None, colsample_bylevel=None,\n",
" colsample_bynode=None, colsample_bytree=None, device=None,\n",
" early_stopping_rounds=20, enable_categorical=False,\n",
" eval_metric=[&#x27;ndcg@10&#x27;], feature_types=None, gamma=None,\n",
" grow_policy=None, importance_type=None, interaction_constraints=None,\n",
" learning_rate=None, max_bin=None, max_cat_threshold=None,\n",
" max_cat_to_onehot=None, max_delta_step=None, max_depth=None,\n",
" max_leaves=None, min_child_weight=None, missing=nan,\n",
" monotone_constraints=None, multi_strategy=None, n_estimators=None,\n",
" n_jobs=None, num_parallel_tree=None, random_state=None, ...)</pre></div></div></div></div></div>"
],
"text/plain": [
"XGBRanker(base_score=None, booster=None, callbacks=None, colsample_bylevel=None,\n",
" colsample_bynode=None, colsample_bytree=None, device=None,\n",
" early_stopping_rounds=20, enable_categorical=False,\n",
" eval_metric=['ndcg@10'], feature_types=None, gamma=None,\n",
" grow_policy=None, importance_type=None, interaction_constraints=None,\n",
" learning_rate=None, max_bin=None, max_cat_threshold=None,\n",
" max_cat_to_onehot=None, max_delta_step=None, max_depth=None,\n",
" max_leaves=None, min_child_weight=None, missing=nan,\n",
" monotone_constraints=None, multi_strategy=None, n_estimators=None,\n",
" n_jobs=None, num_parallel_tree=None, random_state=None, ...)"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.metrics import get_scorer_names\n",
"get_scorer_names()\n",
"\n",
"ranker"
]
}
],
"metadata": {
Expand Down

0 comments on commit 229f604

Please sign in to comment.