Skip to content

Commit 429a11f

Browse files
fm1320prrao87
andauthored
Add Superlinked embedding integration page (#211)
* Add Superlinked embedding integration page Adds a docs page for Superlinked (SIE), a self-hosted inference engine for embedding, reranking, and extraction. The sie-lancedb package registers SIE as a first-class embedding function in LanceDB's registry ("sie" and "sie-multivector"), supports 85+ models, provides a SIEReranker for hybrid search, and an SIEExtractor for entity extraction via enrich_table() with native MultiVector/ColBERT support. * Apply suggestions from code review Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com> --------- Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>
1 parent 0449f3a commit 429a11f

2 files changed

Lines changed: 138 additions & 1 deletion

File tree

docs/docs.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -239,7 +239,8 @@
239239
"integrations/embedding/openai",
240240
"integrations/embedding/openclip",
241241
"integrations/embedding/sentence-transformers",
242-
"integrations/embedding/voyageai"
242+
"integrations/embedding/voyageai",
243+
"integrations/embedding/superlinked"
243244
]
244245
},
245246
{
Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
---
2+
title: Superlinked
3+
sidebarTitle: Superlinked
4+
---
5+
6+
[Superlinked](https://superlinked.com) is a self-hosted inference engine (SIE) for embedding, reranking, and extraction. The `sie-lancedb` package registers SIE as a first-class embedding function in LanceDB's embeddings registry, so embeddings are computed automatically on insert and search. You need a running SIE instance - see the [Superlinked quickstart](https://superlinked.com/docs) for deployment options.
7+
8+
## Installation
9+
10+
<CodeGroup>
11+
pip install sie-lancedb
12+
```
13+
14+
npm install @superlinked/sie-lancedb @lancedb/lancedb
15+
```
16+
</CodeGroup>
17+
18+
## Registered functions
19+
20+
Importing `sie_lancedb` registers two embedding functions in LanceDB's registry:
21+
22+
| Name | Purpose |
23+
|---|---|
24+
| `"sie"` | Dense text embeddings |
25+
| `"sie-multivector"` | ColBERT-style late interaction with MaxSim scoring |
26+
27+
Supported parameters on `.create()`:
28+
29+
| Parameter | Type | Description |
30+
|---|---|---|
31+
| `model` | `str` | Any of 85+ SIE-supported models (e.g. `BAAI/bge-m3`, `NovaSearch/stella_en_400M_v5`, `jinaai/jina-colbert-v2`) |
32+
| `base_url` | `str` | URL of the SIE endpoint (e.g. `http://localhost:8080`) |
33+
34+
## Usage
35+
36+
import lancedb
37+
from lancedb.embeddings import get_registry
38+
from lancedb.pydantic import LanceModel, Vector
39+
import sie_lancedb # registers "sie" and "sie-multivector"
40+
41+
sie = get_registry().get("sie").create(
42+
model="BAAI/bge-m3",
43+
base_url="http://localhost:8080",
44+
)
45+
46+
class Documents(LanceModel):
47+
text: str = sie.SourceField()
48+
vector: Vector(sie.ndims()) = sie.VectorField()
49+
50+
db = lancedb.connect("~/.lancedb")
51+
table = db.create_table("docs", schema=Documents, mode="overwrite")
52+
53+
table.add([
54+
{"text": "Machine learning is a subset of AI."},
55+
{"text": "Neural networks use multiple layers."},
56+
{"text": "Python is popular for ML development."},
57+
])
58+
59+
results = table.search("What is deep learning?").limit(3).to_list()
60+
```
61+
62+
LanceDB handles embedding generation for both inserts and queries automatically, based on the `SourceField` / `VectorField` declarations on the schema.
63+
64+
## Hybrid search with reranker
65+
66+
`SIEReranker` plugs into LanceDB's hybrid search pipeline. It uses SIE's cross-encoder `score()` to rerank combined vector + full-text search results. You need a full-text search index on the column first:
67+
68+
from sie_lancedb import SIEReranker
69+
70+
# Create FTS index for hybrid search
71+
table.create_fts_index("text", replace=True)
72+
73+
results = (
74+
table.search("What is deep learning?", query_type="hybrid")
75+
.rerank(SIEReranker(model="jinaai/jina-reranker-v2-base-multilingual"))
76+
.limit(5)
77+
.to_list()
78+
)
79+
80+
for r in results:
81+
print(f"{r['_relevance_score']:.3f} {r['text']}")
82+
```
83+
84+
The reranker also works with pure vector or pure FTS search via `.rerank()`.
85+
86+
## ColBERT / multivector
87+
88+
`SIEMultiVectorEmbeddingFunction` (registered as `"sie-multivector"`) works with LanceDB's native `MultiVector` type and MaxSim scoring for ColBERT and ColPali models:
89+
90+
from lancedb.pydantic import MultiVector
91+
92+
sie_colbert = get_registry().get("sie-multivector").create(
93+
model="jinaai/jina-colbert-v2",
94+
base_url="http://localhost:8080",
95+
)
96+
97+
class ColBERTDocs(LanceModel):
98+
text: str = sie_colbert.SourceField()
99+
vector: MultiVector(sie_colbert.ndims()) = sie_colbert.VectorField()
100+
101+
table = db.create_table("colbert_docs", schema=ColBERTDocs, mode="overwrite")
102+
table.add([{"text": "Machine learning is a subset of AI."}])
103+
104+
# MaxSim search - query and document multivectors compared token-by-token
105+
results = table.search("What is ML?").limit(5).to_list()
106+
```
107+
108+
## Entity extraction
109+
110+
`SIEExtractor` adds entity extraction to LanceDB's data-enrichment workflows. Extract entities from a text column and merge the results back as a structured Arrow column - enabling filtered search on extracted entities:
111+
112+
```python
113+
from sie_lancedb import SIEExtractor
114+
115+
extractor = SIEExtractor(
116+
base_url="http://localhost:8080",
117+
model="urchade/gliner_multi-v2.1",
118+
)
119+
120+
extractor.enrich_table(
121+
table,
122+
source_column="text",
123+
target_column="entities",
124+
labels=["person", "technology", "organization"],
125+
id_column="id",
126+
)
127+
```
128+
129+
The `entities` column stores structured Arrow data (`list<struct<text, label, score, start, end, bbox>>`), so you can filter on extracted entities in queries.
130+
131+
## Links
132+
133+
- [`sie-lancedb` on PyPI](https://pypi.org/project/sie-lancedb/)
134+
- [`@superlinked/sie-lancedb` on npm](https://www.npmjs.com/package/@superlinked/sie-lancedb)
135+
- [Superlinked on GitHub](https://github.com/superlinked/sie)
136+
- [Superlinked docs](https://superlinked.com/docs)

0 commit comments

Comments
 (0)