Skip to content

Commit d8f49f0

Browse files
authored
Merge pull request #287 from tharropoulos/v28.0
Update documentation according to `v28` changelog
2 parents 4f84cf5 + 98d63d0 commit d8f49f0

10 files changed

+911
-39
lines changed

docs-site/content/.vuepress/config.js

+1
Original file line numberDiff line numberDiff line change
@@ -315,6 +315,7 @@ let config = {
315315
['/28.0/api/curation', 'Curation'],
316316
['/28.0/api/collection-alias', 'Collection Alias'],
317317
['/28.0/api/synonyms', 'Synonyms'],
318+
['/28.0/api/stemming', 'Stemming'],
318319
['/28.0/api/stopwords', 'Stopwords'],
319320
['/28.0/api/cluster-operations', 'Cluster Operations'],
320321
],

docs-site/content/28.0/api/collections.md

+154-25
Large diffs are not rendered by default.

docs-site/content/28.0/api/geosearch.md

+93-1
Original file line numberDiff line numberDiff line change
@@ -425,6 +425,98 @@ You want to specify the geo-points of the polygon as lat, lng pairs.
425425
'filter_by' : 'location:(48.8662, 2.3255, 48.8581, 2.3209, 48.8561, 2.3448, 48.8641, 2.3469)'
426426
```
427427

428+
## Geographic Polygons
429+
430+
You can also store polygonal geographic areas using the `geopolygon` field type and then check if points fall within these areas.
431+
432+
### Creating a Collection with Geopolygons
433+
434+
Let's create a collection with a field to store polygon areas:
435+
436+
<Tabs :tabs="['Shell']">
437+
<template v-slot:Shell>
438+
439+
```bash
440+
curl -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
441+
-H "Content-Type: application/json" \
442+
"http://localhost:8108/collections" -X POST \
443+
-d '{
444+
"name": "territories",
445+
"fields": [
446+
{"name": "name", "type": "string"},
447+
{"name": "area", "type": "geopolygon"}
448+
]
449+
}'
450+
```
451+
452+
</template>
453+
</Tabs>
454+
455+
### Adding Polygon Areas
456+
457+
Add documents containing polygon areas by specifying the coordinates in counter-clockwise (CCW) or clockwise (CW) order:
458+
459+
<Tabs :tabs="['Shell']">
460+
<template v-slot:Shell>
461+
462+
```bash
463+
curl "http://localhost:8108/collections/territories/documents" -X POST \
464+
-H "Content-Type: application/json" \
465+
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
466+
-d '{
467+
"name": "square",
468+
"area": "0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0"
469+
}'
470+
```
471+
472+
</template>
473+
</Tabs>
474+
475+
:::warning NOTE
476+
Coordinates must be specified in proper CCW or CW order to form a valid polygon. Incorrect ordering will result in an error.
477+
:::
478+
479+
### Searching Points in Polygons
480+
481+
You can search for documents whose polygon areas contain a specific point:
482+
483+
<Tabs :tabs="['Shell']">
484+
<template v-slot:Shell>
485+
486+
```bash
487+
curl -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
488+
"http://localhost:8108/collections/territories/documents/search\
489+
?q=*&filter_by=area:(0.5, 0.5)"
490+
```
491+
492+
</template>
493+
</Tabs>
494+
495+
This will return all polygons that contain the point (0.5, 0.5).
496+
497+
**Sample Response**
498+
499+
<Tabs :tabs="['JSON']">
500+
<template v-slot:JSON>
501+
502+
```json
503+
{
504+
"found": 1,
505+
"hits": [
506+
{
507+
"document": {
508+
"area": [0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0],
509+
"id": "0",
510+
"name": "square"
511+
}
512+
}
513+
]
514+
}
515+
```
516+
517+
</template>
518+
</Tabs>
519+
428520
## Sorting by Additional Attributes within a Radius
429521

430522
### exclude_radius
@@ -448,4 +540,4 @@ Similarly, you can bucket all geo points into "groups" using the `precision` par
448540
'sort_by' : 'location(48.853, 2.344, precision: 2mi):asc, popularity:desc'
449541
```
450542

451-
This will bucket the results into 2-mile groups and force records within each bucket into a tie for "geo score", so that the popularity metric can be used to tie-break and sort results within each bucket.
543+
This will bucket the results into 2-mile groups and force records within each bucket into a tie for "geo score", so that the popularity metric can be used to tie-break and sort results within each bucket.

docs-site/content/28.0/api/search.md

+208-3
Large diffs are not rendered by default.
+172
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,172 @@
1+
---
2+
sidebarDepth: 1
3+
sitemap:
4+
priority: 0.7
5+
---
6+
7+
# Stemming
8+
9+
Stemming is a technique that helps handle variations of words during search. When stemming is enabled, a search for one form of a word will also match other grammatical forms of that word. For example:
10+
11+
- Searching for "run" would match "running", "runs", "ran"
12+
- Searching for "walk" would match "walking", "walked", "walks"
13+
- Searching for "company" would match "companies"
14+
15+
Typesense provides two approaches to handle word variations:
16+
17+
## Basic Stemming
18+
19+
Basic stemming uses the [Snowball stemmer](https://snowballstem.org/) algorithm to automatically detect and handle word variations. Being rules-based, it works well for common word patterns in the configured language, but may produce unintended side effects with brand names, proper nouns, and locations. Since these rules are designed primarily for common nouns, applying them to specialized content like company names or locations can sometimes degrade search relevance.
20+
21+
To enable basic stemming for a field, set `"stem": true` in your collection schema:
22+
23+
<Tabs :tabs="['Shell']">
24+
<template v-slot:Shell>
25+
26+
```bash
27+
curl "http://localhost:8108/collections" -X POST \
28+
-H "Content-Type: application/json" \
29+
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" -d '{
30+
"name": "companies",
31+
"fields": [
32+
{"name": "description", "type": "string", "stem": true}
33+
]
34+
}'
35+
```
36+
37+
</template>
38+
</Tabs>
39+
40+
The language used for stemming is automatically determined from the `locale` parameter of the field. For example, setting `"locale": "fr"` will use French-specific stemming rules.
41+
42+
## Custom Stemming Dictionaries
43+
44+
For cases where you need more precise control over word variations, or when dealing with irregular forms that algorithmic stemming can't handle well, you can use stemming dictionaries. These allow you to define exact mappings between words and their root forms.
45+
46+
### Pre-made Dictionaries
47+
48+
Typesense provides a pre-made English plurals dictionary that handles common singular/plural variations. You can download it [here](dl.typesense.org/data/stemming/plurals_en_v1.jsonl)
49+
50+
This dictionary is particularly useful when you need reliable handling of English plural forms without the potential side effects of algorithmic stemming.
51+
52+
### Creating a Stemming Dictionary
53+
54+
First, create a JSONL file with your word mappings:
55+
56+
```json
57+
{"word": "people", "root": "person"}
58+
{"word": "children", "root": "child"}
59+
{"word": "geese", "root": "goose"}
60+
```
61+
62+
Then upload it using the stemming dictionary API:
63+
64+
<Tabs :tabs="['Shell']">
65+
<template v-slot:Shell>
66+
67+
```bash
68+
curl "http://localhost:8108/stemming/dictionary/import?id=irregular-plurals" \
69+
-X POST \
70+
-H "Content-Type: application/json" \
71+
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
72+
--data-binary @dictionary.jsonl
73+
```
74+
75+
</template>
76+
</Tabs>
77+
78+
#### Sample Response
79+
80+
<Tabs :tabs="['JSON']">
81+
<template v-slot:JSON>
82+
83+
```json
84+
{
85+
"id": "irregular-plurals",
86+
"words": [
87+
{"root": "person", "word": "people"},
88+
{"root": "child", "word": "children"},
89+
{"root": "goose", "word": "geese"}
90+
]
91+
}
92+
```
93+
94+
</template>
95+
</Tabs>
96+
97+
### Using a Stemming Dictionary
98+
99+
To use a stemming dictionary, specify it in your collection schema using the `stem_dictionary` parameter:
100+
101+
<Tabs :tabs="['Shell']">
102+
<template v-slot:Shell>
103+
104+
```bash
105+
curl "http://localhost:8108/collections" -X POST \
106+
-H "Content-Type: application/json" \
107+
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" -d '{
108+
"name": "companies",
109+
"fields": [
110+
{"name": "title", "type": "string", "stem_dictionary": "irregular-plurals"}
111+
]
112+
}'
113+
```
114+
115+
</template>
116+
</Tabs>
117+
118+
:::tip Combining Both Approaches
119+
You can use both basic stemming (`"stem": true`) and dictionary stemming (`"stem_dictionary": "dictionary_name"`) on the same field. When both are enabled, dictionary stemming takes precedence for words that exist in the dictionary.
120+
:::
121+
122+
### Managing Dictionaries
123+
124+
#### Retrieve a Dictionary
125+
126+
<Tabs :tabs="['Shell']">
127+
<template v-slot:Shell>
128+
129+
```bash
130+
curl -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
131+
"http://localhost:8108/stemming/dictionary/irregular-plurals"
132+
```
133+
134+
</template>
135+
</Tabs>
136+
137+
#### List All Dictionaries
138+
139+
<Tabs :tabs="['Shell']">
140+
<template v-slot:Shell>
141+
142+
```bash
143+
curl -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
144+
"http://localhost:8108/stemming/dictionaries"
145+
```
146+
147+
</template>
148+
</Tabs>
149+
150+
#### Sample Response
151+
152+
<Tabs :tabs="['JSON']">
153+
<template v-slot:JSON>
154+
155+
```json
156+
{
157+
"dictionaries": ["irregular-plurals", "company-terms"]
158+
}
159+
```
160+
161+
</template>
162+
</Tabs>
163+
164+
## Best Practices
165+
166+
1. **Start with Basic Stemming**: For most use cases, basic stemming with the appropriate locale setting will handle common word variations well.
167+
168+
2. **Use Dictionaries for Exceptions**: Add stemming dictionaries when you need to handle:
169+
- Domain-specific variations
170+
- Cases where basic stemming doesn't give desired results
171+
172+
3. **Language-Specific Considerations**: Remember that basic stemming behavior changes based on the `locale` parameter. Set this appropriately for your content's language.

0 commit comments

Comments
 (0)