Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation according to v28 changelog #287

Merged
merged 17 commits into from
Jan 31, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs-site/content/.vuepress/config.js
Original file line number Diff line number Diff line change
Expand Up @@ -315,6 +315,7 @@ let config = {
['/28.0/api/curation', 'Curation'],
['/28.0/api/collection-alias', 'Collection Alias'],
['/28.0/api/synonyms', 'Synonyms'],
['/28.0/api/stemming', 'Stemming'],
['/28.0/api/stopwords', 'Stopwords'],
['/28.0/api/cluster-operations', 'Cluster Operations'],
],
Expand Down
179 changes: 154 additions & 25 deletions docs-site/content/28.0/api/collections.md

Large diffs are not rendered by default.

94 changes: 93 additions & 1 deletion docs-site/content/28.0/api/geosearch.md
Original file line number Diff line number Diff line change
Expand Up @@ -425,6 +425,98 @@ You want to specify the geo-points of the polygon as lat, lng pairs.
'filter_by' : 'location:(48.8662, 2.3255, 48.8581, 2.3209, 48.8561, 2.3448, 48.8641, 2.3469)'
```

## Geographic Polygons

You can also store polygonal geographic areas using the `geopolygon` field type and then check if points fall within these areas.

### Creating a Collection with Geopolygons

Let's create a collection with a field to store polygon areas:

<Tabs :tabs="['Shell']">
<template v-slot:Shell>

```bash
curl -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-H "Content-Type: application/json" \
"http://localhost:8108/collections" -X POST \
-d '{
"name": "territories",
"fields": [
{"name": "name", "type": "string"},
{"name": "area", "type": "geopolygon"}
]
}'
```

</template>
</Tabs>

### Adding Polygon Areas

Add documents containing polygon areas by specifying the coordinates in counter-clockwise (CCW) or clockwise (CW) order:

<Tabs :tabs="['Shell']">
<template v-slot:Shell>

```bash
curl "http://localhost:8108/collections/territories/documents" -X POST \
-H "Content-Type: application/json" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-d '{
"name": "square",
"area": "0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0"
}'
```

</template>
</Tabs>

:::warning NOTE
Coordinates must be specified in proper CCW or CW order to form a valid polygon. Incorrect ordering will result in an error.
:::

### Searching Points in Polygons

You can search for documents whose polygon areas contain a specific point:

<Tabs :tabs="['Shell']">
<template v-slot:Shell>

```bash
curl -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
"http://localhost:8108/collections/territories/documents/search\
?q=*&filter_by=area:(0.5, 0.5)"
```

</template>
</Tabs>

This will return all polygons that contain the point (0.5, 0.5).

**Sample Response**

<Tabs :tabs="['JSON']">
<template v-slot:JSON>

```json
{
"found": 1,
"hits": [
{
"document": {
"area": [0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0],
"id": "0",
"name": "square"
}
}
]
}
```

</template>
</Tabs>

## Sorting by Additional Attributes within a Radius

### exclude_radius
Expand All @@ -448,4 +540,4 @@ Similarly, you can bucket all geo points into "groups" using the `precision` par
'sort_by' : 'location(48.853, 2.344, precision: 2mi):asc, popularity:desc'
```

This will bucket the results into 2-mile groups and force records within each bucket into a tie for "geo score", so that the popularity metric can be used to tie-break and sort results within each bucket.
This will bucket the results into 2-mile groups and force records within each bucket into a tie for "geo score", so that the popularity metric can be used to tie-break and sort results within each bucket.
211 changes: 208 additions & 3 deletions docs-site/content/28.0/api/search.md

Large diffs are not rendered by default.

172 changes: 172 additions & 0 deletions docs-site/content/28.0/api/stemming.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
---
sidebarDepth: 1
sitemap:
priority: 0.7
---

# Stemming

Stemming is a technique that helps handle variations of words during search. When stemming is enabled, a search for one form of a word will also match other grammatical forms of that word. For example:

- Searching for "run" would match "running", "runs", "ran"
- Searching for "walk" would match "walking", "walked", "walks"
- Searching for "company" would match "companies"

Typesense provides two approaches to handle word variations:

## Basic Stemming

Basic stemming uses the [Snowball stemmer](https://snowballstem.org/) algorithm to automatically detect and handle word variations. Being rules-based, it works well for common word patterns in the configured language, but may produce unintended side effects with brand names, proper nouns, and locations. Since these rules are designed primarily for common nouns, applying them to specialized content like company names or locations can sometimes degrade search relevance.

To enable basic stemming for a field, set `"stem": true` in your collection schema:

<Tabs :tabs="['Shell']">
<template v-slot:Shell>

```bash
curl "http://localhost:8108/collections" -X POST \
-H "Content-Type: application/json" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" -d '{
"name": "companies",
"fields": [
{"name": "description", "type": "string", "stem": true}
]
}'
```

</template>
</Tabs>

The language used for stemming is automatically determined from the `locale` parameter of the field. For example, setting `"locale": "fr"` will use French-specific stemming rules.

## Custom Stemming Dictionaries

For cases where you need more precise control over word variations, or when dealing with irregular forms that algorithmic stemming can't handle well, you can use stemming dictionaries. These allow you to define exact mappings between words and their root forms.

### Pre-made Dictionaries

Typesense provides a pre-made English plurals dictionary that handles common singular/plural variations. You can download it [here](dl.typesense.org/data/stemming/plurals_en_v1.jsonl)

This dictionary is particularly useful when you need reliable handling of English plural forms without the potential side effects of algorithmic stemming.

### Creating a Stemming Dictionary

First, create a JSONL file with your word mappings:

```json
{"word": "people", "root": "person"}
{"word": "children", "root": "child"}
{"word": "geese", "root": "goose"}
```

Then upload it using the stemming dictionary API:

<Tabs :tabs="['Shell']">
<template v-slot:Shell>

```bash
curl "http://localhost:8108/stemming/dictionary/import?id=irregular-plurals" \
-X POST \
-H "Content-Type: application/json" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
--data-binary @dictionary.jsonl
```

</template>
</Tabs>

#### Sample Response

<Tabs :tabs="['JSON']">
<template v-slot:JSON>

```json
{
"id": "irregular-plurals",
"words": [
{"root": "person", "word": "people"},
{"root": "child", "word": "children"},
{"root": "goose", "word": "geese"}
]
}
```

</template>
</Tabs>

### Using a Stemming Dictionary

To use a stemming dictionary, specify it in your collection schema using the `stem_dictionary` parameter:

<Tabs :tabs="['Shell']">
<template v-slot:Shell>

```bash
curl "http://localhost:8108/collections" -X POST \
-H "Content-Type: application/json" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" -d '{
"name": "companies",
"fields": [
{"name": "title", "type": "string", "stem_dictionary": "irregular-plurals"}
]
}'
```

</template>
</Tabs>

:::tip Combining Both Approaches
You can use both basic stemming (`"stem": true`) and dictionary stemming (`"stem_dictionary": "dictionary_name"`) on the same field. When both are enabled, dictionary stemming takes precedence for words that exist in the dictionary.
:::

### Managing Dictionaries

#### Retrieve a Dictionary

<Tabs :tabs="['Shell']">
<template v-slot:Shell>

```bash
curl -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
"http://localhost:8108/stemming/dictionary/irregular-plurals"
```

</template>
</Tabs>

#### List All Dictionaries

<Tabs :tabs="['Shell']">
<template v-slot:Shell>

```bash
curl -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
"http://localhost:8108/stemming/dictionaries"
```

</template>
</Tabs>

#### Sample Response

<Tabs :tabs="['JSON']">
<template v-slot:JSON>

```json
{
"dictionaries": ["irregular-plurals", "company-terms"]
}
```

</template>
</Tabs>

## Best Practices

1. **Start with Basic Stemming**: For most use cases, basic stemming with the appropriate locale setting will handle common word variations well.

2. **Use Dictionaries for Exceptions**: Add stemming dictionaries when you need to handle:
- Domain-specific variations
- Cases where basic stemming doesn't give desired results

3. **Language-Specific Considerations**: Remember that basic stemming behavior changes based on the `locale` parameter. Set this appropriately for your content's language.
Loading