You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fixes 17 Elastic style guide violations across 5 markdown files in
`solutions/search/vector/`.
- **Accessibility**: Replace directional language ("example above" →
"preceding example", "steps below" → "following steps")
- **Formatting**: Capitalize after colons in headings (`knn.md`,
`bring-own-vectors.md`)
- **Word choice**: `prior to` → `before`, `hit` → `result`, `simple` →
`basic`, `may` → `might`, `simply` → `efficiently`
- **Voice/tone**: Remove exclamation point in
`dense-versus-sparse-ingest-pipelines.md`
- **Grammar**: Replace semicolon with period to split into two sentences
(`knn.md`)
**Skipped 4 findings as false positives:**
- `"metadata.value".` — code field name; moving period inside quotes
would alter meaning
- `"and higher"` — refers to compression levels, not a version
- `"my"` in two locations — UI button label and example question text
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Copy file name to clipboardExpand all lines: solutions/search/vector/bring-own-vectors.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -38,7 +38,7 @@ This advanced use case uses the `dense_vector` field type for direct control ove
38
38
:::::{stepper}
39
39
::::{step} Create an index with dense vector field mappings
40
40
41
-
Each document in our simple data set will have:
41
+
Each document in our basic data set will have:
42
42
43
43
* A review: stored in a `review_text` field
44
44
* An embedding of that review: stored in a `review_vector` field, which is defined as a [`dense_vector`](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md) data type.
@@ -164,7 +164,7 @@ POST /amazon-reviews/_search
164
164
2. The `k` parameter specifies the number of results to return.
165
165
3. The `num_candidates` parameter is optional. It limits the number of candidates returned by the search node. This can improve performance and reduce costs.
166
166
167
-
## Next steps: implementing vector search
167
+
## Next steps: Implementing vector search
168
168
169
169
If you want to try a similar workflow from an {{es}} client, use the following guided index workflow in {{es}} Serverless, {{ech}}, or a self-managed cluster:
170
170
@@ -180,7 +180,7 @@ DELETE /amazon-reviews
180
180
181
181
## Learn more about vector search [bring-your-own-vectors-learn-more]
182
182
183
-
In these simple examples, we send a raw vector for the query text. In a real-world scenario, you won’t know the query text ahead of time. You’ll generate query vectors on the fly using the same embedding model that produced the document vectors. For this, deploy a text embedding model in {{es}} and use the[`query_vector_builder` parameter](elasticsearch://reference/query-languages/query-dsl/query-dsl-knn-query.md#knn-query-top-level-parameters). Alternatively, you can generate vectors client-side and send them directly with the search request.
183
+
In these basic examples, we send a raw vector for the query text. In a real-world scenario, you won't know the query text ahead of time. You'll generate query vectors on the fly using the same embedding model that produced the document vectors. For this, deploy a text embedding model in {{es}} and use the[`query_vector_builder` parameter](elasticsearch://reference/query-languages/query-dsl/query-dsl-knn-query.md#knn-query-top-level-parameters). Alternatively, you can generate vectors client-side and send them directly with the search request.
184
184
185
185
For an example of using pipelines to generate text embeddings, check out [](/solutions/search/vector/dense-versus-sparse-ingest-pipelines.md).
Copy file name to clipboardExpand all lines: solutions/search/vector/dense-versus-sparse-ingest-pipelines.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -204,7 +204,7 @@ To ingest data through the pipeline to generate text embeddings with your chosen
204
204
::::::
205
205
206
206
:::::::
207
-
Now it is time to perform semantic search!
207
+
Now it is time to perform semantic search.
208
208
209
209
## Search the data with vector search [deployed-search]
210
210
@@ -256,7 +256,7 @@ GET my-index/_search
256
256
257
257
## Beyond semantic search with hybrid search [deployed-hybrid-search]
258
258
259
-
In some situations, lexical search may perform better than semantic search. For example, when searching for single words or IDs, like product numbers.
259
+
In some situations, lexical search might perform better than semantic search. For example, when searching for single words or IDs, like product numbers.
260
260
261
261
Combining semantic and lexical search into one hybrid search request using [reciprocal rank fusion](elasticsearch://reference/elasticsearch/rest-apis/reciprocal-rank-fusion.md) provides the best of both worlds. Not only that, but hybrid search using reciprocal rank fusion [has been shown to perform better in general](https://www.elastic.co/blog/improving-information-retrieval-elastic-stack-hybrid).
Copy file name to clipboardExpand all lines: solutions/search/vector/knn.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -49,7 +49,7 @@ The default type of {{es-serverless}} project is suitable for this use case unle
49
49
Refer to [](dense-vector.md#vector-profiles).
50
50
:::
51
51
52
-
## kNN search methods: approximate and exact kNN [knn-methods]
52
+
## kNN search methods: Approximate and exact kNN [knn-methods]
53
53
54
54
{{es}} supports two methods for kNN search:
55
55
@@ -127,7 +127,7 @@ To run an approximate kNN search:
127
127
The document `_score` is a positive 32-bit floating-point number that ranks result relevance. In {{es}} kNN search, `_score` is derived from the chosen vector similarity metric between the query and document vectors. Refer to [`similarity`](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-similarity) for details on how kNN scores are computed.
128
128
129
129
::::{note}
130
-
Support for approximate kNN search was added in version 8.0. Before 8.0, `dense_vector` fields did not support enabling `index` in the mapping. If you created an index prior to 8.0 with `dense_vector` fields, reindex using a new mapping with `index: true` (which is the default value) to use approximate kNN.
130
+
Support for approximate kNN search was added in version 8.0. Before 8.0, `dense_vector` fields did not support enabling `index` in the mapping. If you created an index before 8.0 with `dense_vector` fields, reindex using a new mapping with `index: true` (which is the default value) to use approximate kNN.
131
131
::::
132
132
133
133
### Indexing considerations for approximate kNN search [knn-indexing-considerations]
@@ -418,7 +418,7 @@ POST image-index/_search
418
418
419
419
This search finds the global top `k = 5` vector matches, combines them with the matches from the `match` query, and finally returns the 10 top-scoring results. The `knn` and `query` matches are combined through a disjunction, as if you took a boolean *or* between them. The top `k` vector results represent the global nearest neighbors across all index shards.
420
420
421
-
The score of each hit is the sum of the `knn` and `query` scores. You can specify a `boost` value to give a weight to each score in the sum. In the example above, the scores will be calculated as
421
+
The score of each result is the sum of the `knn` and `query` scores. You can specify a `boost` value to give a weight to each score in the sum. In the preceding example, the scores will be calculated as
422
422
423
423
```
424
424
score = 0.9 * match_score + 0.1 * knn_score
@@ -568,7 +568,7 @@ In this data set, the only document with `file-type = png` has the vector `[42,
568
568
When text exceeds a model’s token limit, chunking must be performed before generating embeddings for each chunk. By combining [`nested`](elasticsearch://reference/elasticsearch/mapping-reference/nested.md) fields with [`dense_vector`](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md), you can perform nearest passage retrieval without copying top-level document metadata.
569
569
Note that nested kNN queries only support [score_mode](elasticsearch://reference/query-languages/query-dsl/query-dsl-nested-query.md#nested-top-level-params)=`max`.
570
570
571
-
Here is a simple passage vectors index that stores vectors and some top-level metadata for filtering.
571
+
Here is a basic passage vectors index that stores vectors and some top-level metadata for filtering.
572
572
573
573
```console
574
574
PUT passage_vectors
@@ -644,7 +644,7 @@ POST passage_vectors/_search
644
644
}
645
645
```
646
646
647
-
Note that even with 4 total nested vectors, the response still returns two documents. kNN search over nested dense vectors will always diversify the top results over the top-level document;`"k"` top-level documents will be returned, scored by their nearest passage vector (for example, `"paragraph.vector"`).
647
+
Note that even with 4 total nested vectors, the response still returns two documents. kNN search over nested dense vectors will always diversify the top results over the top-level document.`"k"` top-level documents will be returned, scored by their nearest passage vector (for example, `"paragraph.vector"`).
648
648
649
649
```console-result
650
650
{
@@ -1221,7 +1221,7 @@ All quantization introduces some accuracy loss, and higher compression generally
1221
1221
1222
1222
*`int8` typically needs little to no rescoring.
1223
1223
*`int4` often benefits from rescoring for higher accuracy or recall; 1.5×–2× oversampling usually recovers most loss.
1224
-
*`bbq` commonly requires rescoring except on very large indices or models specifically designed for quantization; 3×–5× oversampling is generally sufficient, but higher may be needed for low-dimension vectors or embeddings that quantize poorly.
1224
+
*`bbq` commonly requires rescoring except on very large indices or models specifically designed for quantization; 3×–5× oversampling is generally sufficient, but higher might be needed for low-dimension vectors or embeddings that quantize poorly.
1225
1225
1226
1226
#### The `rescore_vector` option
1227
1227
```{applies_to}
@@ -1318,7 +1318,7 @@ POST /my-index/_search
1318
1318
2. The number of results to return from the KNN search. This will do an approximate KNN search with 50 candidates per HNSW graph and use the quantized vectors, returning the 20 most similar vectors according to the quantized score. Additionally, because this is the top-level `knn` object, the global top 20 results from all shards will be gathered before rescoring. Combining with `rescore`, this is oversampling by `2x`, meaning gathering 20 nearest neighbors according to quantized scoring and rescoring with higher fidelity float vectors.
1319
1319
3. The number of results to rescore, if you want to rescore all results, set this to the same value as `k`
1320
1320
4. The script to rescore the results. Script score will interact directly with the originally provided float32 vector.
1321
-
5. The weight of the original query, here we simply throw away the original score
1321
+
5. The weight of the original query, here we throw away the original score
1322
1322
6. The weight of the rescore query, here we only use the rescore query
1323
1323
1324
1324
##### Use a `script_score` query to rescore per shard [dense-vector-knn-search-rescoring-script-score]
Copy file name to clipboardExpand all lines: solutions/search/vector/vector-search-use-cases.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -44,7 +44,7 @@ Read how retrieval, chunking, and orchestration fit together.
44
44
45
45
:::::{step} Set up search for your documents
46
46
47
-
Split long documents into smaller chunks so each search hit is a useful passage. Refer to [How to implement retrieval](#how-to-implement-retrieval) to choose your embedding approach, query interface, and search strategy.
47
+
Split long documents into smaller chunks so each search result is a useful passage. Refer to [How to implement retrieval](#how-to-implement-retrieval) to choose your embedding approach, query interface, and search strategy.
48
48
49
49
:::::
50
50
@@ -59,7 +59,7 @@ Send the top search hits and their text fields to your model or orchestration la
59
59
60
60
## Discovery and recommendations
61
61
62
-
Find related products, articles, videos, or other items when keywords alone do not match well. Examples include "similar products," "you may also like," and matching users or players in an app.
62
+
Find related products, articles, videos, or other items when keywords alone do not match well. Examples include "similar products," "you might also like," and matching users or players in an app.
63
63
64
64
::::::{stepper}
65
65
:::::{step} Store embeddings for each item
@@ -99,7 +99,7 @@ The closest vectors are not always the best final ranking. You can boost by popu
99
99
100
100
Search images, audio, video, or text when your content uses more than one type. For example, search with text to find images, or search with an image to find similar images.
101
101
102
-
The steps below use the [Inference API](../semantic-search/semantic-search-inference.md) to embed multimodal content. Refer to [How to implement retrieval](#how-to-implement-retrieval) for other embedding approaches.
102
+
The following steps use the [Inference API](../semantic-search/semantic-search-inference.md) to embed multimodal content. Refer to [How to implement retrieval](#how-to-implement-retrieval) for other embedding approaches.
Copy file name to clipboardExpand all lines: solutions/search/vector/vector-storage-for-semantic-search.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -335,7 +335,7 @@ The response includes the `index_options` you configured under the `content` fie
335
335
}
336
336
```
337
337
338
-
1. The `index_options` block confirms your quantization strategy is applied. After indexing data, the mapping may also include auto-detected `model_settings` such as dimensions and similarity metric.
338
+
1. The `index_options` block confirms your quantization strategy is applied. After indexing data, the mapping might also include auto-detected `model_settings` such as dimensions and similarity metric.
0 commit comments