Skip to content

Commit 1d24599

Browse files
Copilotleemthompo
andauthored
Docs fix — style: path /solutions/search/vector — 7 pages (#6922)
Fixes 17 Elastic style guide violations across 5 markdown files in `solutions/search/vector/`. - **Accessibility**: Replace directional language ("example above" → "preceding example", "steps below" → "following steps") - **Formatting**: Capitalize after colons in headings (`knn.md`, `bring-own-vectors.md`) - **Word choice**: `prior to` → `before`, `hit` → `result`, `simple` → `basic`, `may` → `might`, `simply` → `efficiently` - **Voice/tone**: Remove exclamation point in `dense-versus-sparse-ingest-pipelines.md` - **Grammar**: Replace semicolon with period to split into two sentences (`knn.md`) **Skipped 4 findings as false positives:** - `"metadata.value".` — code field name; moving period inside quotes would alter meaning - `"and higher"` — refers to compression levels, not a version - `"my"` in two locations — UI button label and example question text --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Liam Thompson <leemthompo@gmail.com>
1 parent a72a9dd commit 1d24599

5 files changed

Lines changed: 16 additions & 16 deletions

File tree

solutions/search/vector/bring-own-vectors.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ This advanced use case uses the `dense_vector` field type for direct control ove
3838
:::::{stepper}
3939
::::{step} Create an index with dense vector field mappings
4040

41-
Each document in our simple data set will have:
41+
Each document in our basic data set will have:
4242

4343
* A review: stored in a `review_text` field
4444
* An embedding of that review: stored in a `review_vector` field, which is defined as a [`dense_vector`](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md) data type.
@@ -164,7 +164,7 @@ POST /amazon-reviews/_search
164164
2. The `k` parameter specifies the number of results to return.
165165
3. The `num_candidates` parameter is optional. It limits the number of candidates returned by the search node. This can improve performance and reduce costs.
166166

167-
## Next steps: implementing vector search
167+
## Next steps: Implementing vector search
168168

169169
If you want to try a similar workflow from an {{es}} client, use the following guided index workflow in {{es}} Serverless, {{ech}}, or a self-managed cluster:
170170

@@ -180,7 +180,7 @@ DELETE /amazon-reviews
180180

181181
## Learn more about vector search [bring-your-own-vectors-learn-more]
182182

183-
In these simple examples, we send a raw vector for the query text. In a real-world scenario, you wont know the query text ahead of time. Youll generate query vectors on the fly using the same embedding model that produced the document vectors. For this, deploy a text embedding model in {{es}} and use the[`query_vector_builder` parameter](elasticsearch://reference/query-languages/query-dsl/query-dsl-knn-query.md#knn-query-top-level-parameters). Alternatively, you can generate vectors client-side and send them directly with the search request.
183+
In these basic examples, we send a raw vector for the query text. In a real-world scenario, you won't know the query text ahead of time. You'll generate query vectors on the fly using the same embedding model that produced the document vectors. For this, deploy a text embedding model in {{es}} and use the[`query_vector_builder` parameter](elasticsearch://reference/query-languages/query-dsl/query-dsl-knn-query.md#knn-query-top-level-parameters). Alternatively, you can generate vectors client-side and send them directly with the search request.
184184

185185
For an example of using pipelines to generate text embeddings, check out [](/solutions/search/vector/dense-versus-sparse-ingest-pipelines.md).
186186

solutions/search/vector/dense-versus-sparse-ingest-pipelines.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -204,7 +204,7 @@ To ingest data through the pipeline to generate text embeddings with your chosen
204204
::::::
205205

206206
:::::::
207-
Now it is time to perform semantic search!
207+
Now it is time to perform semantic search.
208208

209209
## Search the data with vector search [deployed-search]
210210

@@ -256,7 +256,7 @@ GET my-index/_search
256256

257257
## Beyond semantic search with hybrid search [deployed-hybrid-search]
258258

259-
In some situations, lexical search may perform better than semantic search. For example, when searching for single words or IDs, like product numbers.
259+
In some situations, lexical search might perform better than semantic search. For example, when searching for single words or IDs, like product numbers.
260260

261261
Combining semantic and lexical search into one hybrid search request using [reciprocal rank fusion](elasticsearch://reference/elasticsearch/rest-apis/reciprocal-rank-fusion.md) provides the best of both worlds. Not only that, but hybrid search using reciprocal rank fusion [has been shown to perform better in general](https://www.elastic.co/blog/improving-information-retrieval-elastic-stack-hybrid).
262262

solutions/search/vector/knn.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ The default type of {{es-serverless}} project is suitable for this use case unle
4949
Refer to [](dense-vector.md#vector-profiles).
5050
:::
5151

52-
## kNN search methods: approximate and exact kNN [knn-methods]
52+
## kNN search methods: Approximate and exact kNN [knn-methods]
5353

5454
{{es}} supports two methods for kNN search:
5555

@@ -127,7 +127,7 @@ To run an approximate kNN search:
127127
The document `_score` is a positive 32-bit floating-point number that ranks result relevance. In {{es}} kNN search, `_score` is derived from the chosen vector similarity metric between the query and document vectors. Refer to [`similarity`](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-similarity) for details on how kNN scores are computed.
128128

129129
::::{note}
130-
Support for approximate kNN search was added in version 8.0. Before 8.0, `dense_vector` fields did not support enabling `index` in the mapping. If you created an index prior to 8.0 with `dense_vector` fields, reindex using a new mapping with `index: true` (which is the default value) to use approximate kNN.
130+
Support for approximate kNN search was added in version 8.0. Before 8.0, `dense_vector` fields did not support enabling `index` in the mapping. If you created an index before 8.0 with `dense_vector` fields, reindex using a new mapping with `index: true` (which is the default value) to use approximate kNN.
131131
::::
132132

133133
### Indexing considerations for approximate kNN search [knn-indexing-considerations]
@@ -418,7 +418,7 @@ POST image-index/_search
418418

419419
This search finds the global top `k = 5` vector matches, combines them with the matches from the `match` query, and finally returns the 10 top-scoring results. The `knn` and `query` matches are combined through a disjunction, as if you took a boolean *or* between them. The top `k` vector results represent the global nearest neighbors across all index shards.
420420

421-
The score of each hit is the sum of the `knn` and `query` scores. You can specify a `boost` value to give a weight to each score in the sum. In the example above, the scores will be calculated as
421+
The score of each result is the sum of the `knn` and `query` scores. You can specify a `boost` value to give a weight to each score in the sum. In the preceding example, the scores will be calculated as
422422

423423
```
424424
score = 0.9 * match_score + 0.1 * knn_score
@@ -568,7 +568,7 @@ In this data set, the only document with `file-type = png` has the vector `[42,
568568
When text exceeds a model’s token limit, chunking must be performed before generating embeddings for each chunk. By combining [`nested`](elasticsearch://reference/elasticsearch/mapping-reference/nested.md) fields with [`dense_vector`](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md), you can perform nearest passage retrieval without copying top-level document metadata.
569569
Note that nested kNN queries only support [score_mode](elasticsearch://reference/query-languages/query-dsl/query-dsl-nested-query.md#nested-top-level-params)=`max`.
570570

571-
Here is a simple passage vectors index that stores vectors and some top-level metadata for filtering.
571+
Here is a basic passage vectors index that stores vectors and some top-level metadata for filtering.
572572

573573
```console
574574
PUT passage_vectors
@@ -644,7 +644,7 @@ POST passage_vectors/_search
644644
}
645645
```
646646

647-
Note that even with 4 total nested vectors, the response still returns two documents. kNN search over nested dense vectors will always diversify the top results over the top-level document; `"k"` top-level documents will be returned, scored by their nearest passage vector (for example, `"paragraph.vector"`).
647+
Note that even with 4 total nested vectors, the response still returns two documents. kNN search over nested dense vectors will always diversify the top results over the top-level document. `"k"` top-level documents will be returned, scored by their nearest passage vector (for example, `"paragraph.vector"`).
648648

649649
```console-result
650650
{
@@ -1221,7 +1221,7 @@ All quantization introduces some accuracy loss, and higher compression generally
12211221

12221222
* `int8` typically needs little to no rescoring.
12231223
* `int4` often benefits from rescoring for higher accuracy or recall; 1.5×–2× oversampling usually recovers most loss.
1224-
* `bbq` commonly requires rescoring except on very large indices or models specifically designed for quantization; 3×–5× oversampling is generally sufficient, but higher may be needed for low-dimension vectors or embeddings that quantize poorly.
1224+
* `bbq` commonly requires rescoring except on very large indices or models specifically designed for quantization; 3×–5× oversampling is generally sufficient, but higher might be needed for low-dimension vectors or embeddings that quantize poorly.
12251225

12261226
#### The `rescore_vector` option
12271227
```{applies_to}
@@ -1318,7 +1318,7 @@ POST /my-index/_search
13181318
2. The number of results to return from the KNN search. This will do an approximate KNN search with 50 candidates per HNSW graph and use the quantized vectors, returning the 20 most similar vectors according to the quantized score. Additionally, because this is the top-level `knn` object, the global top 20 results from all shards will be gathered before rescoring. Combining with `rescore`, this is oversampling by `2x`, meaning gathering 20 nearest neighbors according to quantized scoring and rescoring with higher fidelity float vectors.
13191319
3. The number of results to rescore, if you want to rescore all results, set this to the same value as `k`
13201320
4. The script to rescore the results. Script score will interact directly with the originally provided float32 vector.
1321-
5. The weight of the original query, here we simply throw away the original score
1321+
5. The weight of the original query, here we throw away the original score
13221322
6. The weight of the rescore query, here we only use the rescore query
13231323

13241324
##### Use a `script_score` query to rescore per shard [dense-vector-knn-search-rescoring-script-score]

solutions/search/vector/vector-search-use-cases.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ Read how retrieval, chunking, and orchestration fit together.
4444

4545
:::::{step} Set up search for your documents
4646

47-
Split long documents into smaller chunks so each search hit is a useful passage. Refer to [How to implement retrieval](#how-to-implement-retrieval) to choose your embedding approach, query interface, and search strategy.
47+
Split long documents into smaller chunks so each search result is a useful passage. Refer to [How to implement retrieval](#how-to-implement-retrieval) to choose your embedding approach, query interface, and search strategy.
4848

4949
:::::
5050

@@ -59,7 +59,7 @@ Send the top search hits and their text fields to your model or orchestration la
5959

6060
## Discovery and recommendations
6161

62-
Find related products, articles, videos, or other items when keywords alone do not match well. Examples include "similar products," "you may also like," and matching users or players in an app.
62+
Find related products, articles, videos, or other items when keywords alone do not match well. Examples include "similar products," "you might also like," and matching users or players in an app.
6363

6464
::::::{stepper}
6565
:::::{step} Store embeddings for each item
@@ -99,7 +99,7 @@ The closest vectors are not always the best final ranking. You can boost by popu
9999

100100
Search images, audio, video, or text when your content uses more than one type. For example, search with text to find images, or search with an image to find similar images.
101101

102-
The steps below use the [Inference API](../semantic-search/semantic-search-inference.md) to embed multimodal content. Refer to [How to implement retrieval](#how-to-implement-retrieval) for other embedding approaches.
102+
The following steps use the [Inference API](../semantic-search/semantic-search-inference.md) to embed multimodal content. Refer to [How to implement retrieval](#how-to-implement-retrieval) for other embedding approaches.
103103

104104
::::::{stepper}
105105
:::::{step} Create an inference endpoint

solutions/search/vector/vector-storage-for-semantic-search.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -335,7 +335,7 @@ The response includes the `index_options` you configured under the `content` fie
335335
}
336336
```
337337

338-
1. The `index_options` block confirms your quantization strategy is applied. After indexing data, the mapping may also include auto-detected `model_settings` such as dimensions and similarity metric.
338+
1. The `index_options` block confirms your quantization strategy is applied. After indexing data, the mapping might also include auto-detected `model_settings` such as dimensions and similarity metric.
339339

340340
:::
341341

0 commit comments

Comments
 (0)