Skip to content

Commit 0f5c835

Browse files
committed
Restructure tests
1 parent 3c40788 commit 0f5c835

File tree

9 files changed

+1334
-7
lines changed

9 files changed

+1334
-7
lines changed

.github/workflows/ci.yml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ jobs:
2323
# Additional services can be defined here if required.
2424
services:
2525
db:
26-
image: postgres:15
26+
image: postgres:17
2727
ports:
2828
- 5432/tcp
2929
env:
@@ -86,6 +86,11 @@ jobs:
8686
- name: Install dependencies
8787
run: mix deps.get --check-locked
8888

89+
- name: Install pg_textsearch extension
90+
run: |
91+
docker exec ${{ job.services.db.id }} bash -lc "apt-get update && apt-get install -y build-essential postgresql-server-dev-17"
92+
docker cp ./pg_textsearch ${{ job.services.db.id }}:/tmp/pg_textsearch
93+
docker exec ${{ job.services.db.id }} bash -lc "cd /tmp/pg_textsearch && make && make install"
8994
# TODO These steps can be moved to `mix.exs`
9095

9196
# Step: Compile the project treating any warnings as errors.

CHANGELOG.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,30 @@
1+
# v0.6.0
2+
3+
## New 🔥
4+
5+
**BM25 Full-Text Search** is now available via the new `Torus.bm25/5` macro!
6+
7+
[BM25](https://en.wikipedia.org/wiki/Okapi_BM25) is a modern ranking algorithm that generally provides superior relevance scoring compared to traditional TF-IDF (used by `full_text/5`). This integration uses the [pg_textsearch](https://github.com/timescale/pg_textsearch) extension by Timescale.
8+
9+
Key features:
10+
- State-of-the-art BM25 ranking with configurable parameters (k1, b)
11+
- Fast top-k queries via Block-Max WAND optimization
12+
- Simple syntax: `Post |> Torus.bm25([p], p.body, "search term") |> limit(10)`
13+
- Score selection and filtering capabilities
14+
- Pre-filtering with `<@>>` operator
15+
- Language support through PostgreSQL text search configurations
16+
17+
Requirements:
18+
- PostgreSQL 17+
19+
- pg_textsearch extension installed
20+
- BM25 index on the search column
21+
22+
See `Torus.bm25/5` documentation for complete details and migration guide.
23+
24+
**When to use BM25 vs full_text:**
25+
- Use `bm25/5` for superior single-column search with modern relevance ranking
26+
- Use `full_text/5` for multi-column search with weights or when using stored tsvector columns
27+
128
# v0.5.3
229

330
## Fixes

README.md

Lines changed: 21 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ Post
3838

3939
See [`full_text/5`](https://hexdocs.pm/torus/Torus.html#full_text/5) for more details.
4040

41-
## 6 types of search:
41+
## 7 types of search:
4242

4343
1. **Pattern matching**: Searches for a specific pattern in a string.
4444

@@ -84,10 +84,28 @@ See [`full_text/5`](https://hexdocs.pm/torus/Torus.html#full_text/5) for more de
8484
["Diagon Bombshell"]
8585
```
8686

87-
Use it when you dont care about spelling, the documents are long, or if you need to order the results by rank.
87+
Use it when you don't care about spelling, the documents are long, you need multi-column search with weights, or if you need to order the results by rank.
8888

8989
See [`full_text/5`](https://hexdocs.pm/torus/Torus.html#full_text/5) for more details.
9090

91+
1. **BM25 full text**: Modern BM25 ranking algorithm for superior relevance scoring using the [pg_textsearch](https://github.com/timescale/pg_textsearch) extension. BM25 generally provides better ranking than traditional built-in TF-IDF full text search and is optimized for top-k queries.
92+
93+
```elixir
94+
iex> insert_post!(body: "PostgreSQL is a powerful relational database system")
95+
...> insert_post!(body: "BM25 is a ranking function used in search engines")
96+
...> insert_post!(body: "Full text search enables finding relevant documents")
97+
...> Post
98+
...> |> Torus.bm25([p], p.body, "database system")
99+
...> |> limit(10)
100+
...> |> select([p], p.body)
101+
...> |> Repo.all()
102+
["PostgreSQL is a powerful relational database system"]
103+
```
104+
105+
Use it when you need state-of-the-art relevance ranking for single-column search, especially with LIMIT clauses. Requires PostgreSQL 17+.
106+
107+
See [`bm25/5`](https://hexdocs.pm/torus/Torus.html#bm25/5) for more details.
108+
91109
1. **Semantic Search**: Understands the contextual meaning of queries to match and retrieve related content utilizing natural language processing. Read more about semantic search in [Semantic search with Torus guide](/guides/semantic_search.md).
92110

93111
```elixir
@@ -131,7 +149,7 @@ Torus offers a few helpers to debug, explain, and analyze your queries before us
131149

132150
## Torus support
133151

134-
For now, Torus supports pattern match, similarity, full-text, and semantic search, with plans to expand support further. These docs will be updated with more examples on which search type to choose and how to make them more performant (by adding indexes or using specific functions).
152+
For now, Torus supports pattern match, similarity, full-text (TF-IDF and BM25), and semantic search, with plans to expand support further. These docs will be updated with more examples on which search type to choose and how to make them more performant (by adding indexes or using specific functions).
135153

136154
<!-- MDOC -->
137155

0 commit comments

Comments
 (0)