Skip to content

Commit 5f23dbf

Browse files
authored
Merge branch 'lancedb:main' into feat/embedding-model-voyage-multimodal-3.5
2 parents 30628df + 5cb5f61 commit 5f23dbf

33 files changed

Lines changed: 552 additions & 6274 deletions

AGENTS.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# LanceDB Mintlify Documentation
2+
3+
This is a documentation site for [LanceDB](https://docs.lancedb.com).
4+
5+
## Languages used
6+
7+
- Code examples are primarily in three language SDKs: Python, TypeScript and Rust.
8+
- Best practices for linting, formatting and code complexity for each respective language apply.
9+
- Write idiomatic code as far as possible
10+
11+
## Running Python code
12+
13+
When running Python code, we have to cater to users of both pip and uv.
14+
15+
- Use 4 spaces to represent a tab (do not use tab characters)
16+
- Always attempt to first run *any* Python code via the local virtual environment
17+
- Look for a local virtual environment (typically in `.venv` or `venv`)
18+
- Activate the environment, so that you can run multiple code exampes in the same environment
19+
- Avoid using `uv run` directly, as you have issues running it in your sandbox
20+
- Only fall back to the system `python3` to run code if the above steps don't work

CLAUDE.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# LanceDB Mintlify Documentation
2+
3+
This is a documentation site for [LanceDB](https://docs.lancedb.com).
4+
5+
## Languages used
6+
7+
- Code examples are primarily in three language SDKs: Python, TypeScript and Rust.
8+
- Best practices for linting, formatting and code complexity for each respective language apply.
9+
- Write idiomatic code as far as possible
10+
11+
## Running Python code
12+
13+
When running Python code, we have to cater to users of both pip and uv.
14+
15+
- Use 4 spaces to represent a tab (do not use tab characters)
16+
- Always attempt to first run *any* Python code via the local virtual environment
17+
- Look for a local virtual environment (typically in `.venv` or `venv`)
18+
- Activate the environment, so that you can run multiple code exampes in the same environment
19+
- Avoid using `uv run` directly, as you have issues running it in your sandbox
20+
- Only fall back to the system `python3` to run code if the above steps don't work

README.md

Lines changed: 27 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,36 @@
11
# LanceDB Mintlify Documentation
22

3-
Home of the new LanceDB documentation on Mintlify.
3+
Home of the [LanceDB](https://lancedb.com/) documentation. Built using [Mintlify](https://www.mintlify.com/).
44

55
## Development
66

77
Install the [Mintlify CLI](https://www.npmjs.com/package/mintlify) to preview the documentation changes locally. To install, use the following command
88

9-
```
9+
```bash
1010
npm i -g mintlify
1111
```
1212

1313
Run the following commands at the root of the documentation (`/docs/` in this repo, where `docs.json` is located).
1414

15-
```
15+
```bash
1616
cd docs
1717
mint dev
1818
```
1919

20+
Check broken links (applies to internal links within this docs site only):
21+
22+
```bash
23+
mint broken-links
24+
```
25+
2026
## Generate snippets
2127

28+
To generate snippets, use `uv` to sync your local Python environment so that you can run the Python script described below.
29+
30+
```bash
31+
uv sync
32+
```
33+
2234
The Python, TypeScript and Rust code snippets used in the documentation are tested prior to use in the docs. These tests are located in the `tests/` directory. Run the tests locally for each language
2335
when building the docs locally.
2436

@@ -38,17 +50,20 @@ make rs
3850
make snippets
3951
```
4052

53+
The generated snippets are placed in the appropriate file in `/docs/snippets/` directory, making them
54+
available for importing in the corresponding file.
55+
4156
The following sequence of steps are run:
4257

43-
1. Run tests for py, ts, rs files to verify that the code works
44-
2. Generate MDX snippets
45-
3. Import MDX snippets in the corresponding documentation page
46-
4. Add the NDX snippet inside a `<CodeBlock>` component in Mintlify
58+
1. Run tests for py, ts, rs files that contain new code you added, and verify that the tests pass locally
59+
2. Generate MDX snippets via the `make snippets` command
60+
3. Import MDX snippets in the corresponding MDX docs page
61+
4. Include the MDX snippet as a parameter inside a `<CodeBlock>` JSX component in Mintlify
4762

48-
This ensures that the code in the docs is in line with the latest LanceDB API
63+
Creating and using snippets for code blocks in the MDX files helps ensure that we are placing
64+
code that's been tested (per recent LanceDB releases) in the hands of users.
4965

5066
> [!NOTE]
51-
> Do not add code snippets manually inside triple-backticks! Write the tests to the `tests` directory,
52-
> then generate the snippets programmatically via the Makefile commands. This helps ensure that
53-
> the documentation shows code that was actually run by a human, and by CI.
54-
67+
> As far as possible, do not add code snippets manually inside triple-backticks! Write the tests for
68+
> the required language in `tests/*` directory, then generate the snippets programmatically via the Makefile
69+
> commands.

WRITING.md

Lines changed: 199 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,199 @@
1+
# Writing Guide
2+
3+
This is a documentation site built in [Mintlify](https://www.mintlify.com/docs). Writing in Mintlify is similar to `README.md`
4+
docs you may be used to writing in markdown -- the main difference is that Mintlify uses MDX files (Markdown + JSX) files
5+
instead of regular markdown files.
6+
7+
It's worth going through the key sections in the Mintlify [docs](https://www.mintlify.com/docs) before you begin writing.
8+
To begin writing, create a new MDX file in the appropriate location and follow the steps below.
9+
10+
## 1. Create the MDX file
11+
12+
As far as possible, docs are organized at the top level by concept, in the `/docs/` directory. In certain cases,
13+
an additional level of nesting into an inner subdirectory is okay to avoid cluttering in the sidebar. However, it's
14+
recommended to avoid nesting more than 2 levels deep as this affects the flow and discoverability from a user
15+
perspective.
16+
17+
## 2. Update `docs.json`
18+
19+
The sitemap is defined in JSON format in `docs.json`. You can organize new content into tab groups and pages, as per the
20+
structure shown in the existing `docs.json`. To view the new page in the sidebar and in the local preview, it must be
21+
referenced in `docs.json` in the appropriate location.
22+
23+
## 3. Add frontmatter
24+
25+
Frontmatter is written in YAML, and is compulsory for all MDX files that contain documentation. It's recommended to _always_
26+
have at least the first three keys (`title`, `sidebarTitle` and `description`) for readability and SEO on a given docs page.
27+
Specifying an `icon` helps readers associate a familiar image with the page title in the sidebar. For searchability within
28+
the docs, you can optionally specify the `keywords` field and pass in a list of keyword strings. When a user searches for
29+
those strings, the page is prioritized in the search box.
30+
31+
Here's an example:
32+
33+
```yml
34+
---
35+
title: "Lance format"
36+
sidebarTitle: "Lance format"
37+
description: "Open-source lakehouse format for multimodal AI."
38+
icon: "/static/assets/logo/lance-logo-gray.svg"
39+
keywords: ["lance"]
40+
---
41+
```
42+
43+
> [!NOTE]
44+
> The example above showed a custom SVG icon in the `/static/assets/` directory of this repo, but you can pick stock
45+
> icons from [fontawesome.com](https://fontawesome.com/icons) by searching for a high-level concept by name.
46+
47+
## 4. Begin writing
48+
49+
Writing in Mintlify is similar to conventional markdown, except that you have access to JSX-based (React) components that
50+
make it much simple to add documentation-friendly functionality and aesthetics to the docs page. Components are a very powerful
51+
addition to the writing experience, and are covered in detail on the [Mintlify docs](https://www.mintlify.com/docs/components/accordions).
52+
53+
Below is an example of a `Card`, which emphasizes content, while providing a clickable URL out of the given page.
54+
```jsx
55+
<Card
56+
title="Quickstart"
57+
icon="rocket"
58+
href="/quickstart"
59+
>
60+
Get started with LanceDB in minutes.
61+
</Card>
62+
```
63+
64+
The best part about components is that they are composable. You can embed one component inside another and achieve the functionality of both. The example below shows an `Card` at the top level, with an `Accordion` inside it.
65+
66+
```mdx
67+
<Card
68+
title="Quickstart"
69+
icon="rocket"
70+
href="/quickstart"
71+
>
72+
Get started with LanceDB in minutes.
73+
74+
<Accordion>
75+
Collapsible text content here....
76+
</Accordion>
77+
78+
</Card>
79+
```
80+
81+
## 5. Mathematical equations
82+
83+
Math equations are supported via standard KaTeX plugins. You can write any LaTeX-style equation and get it rendered on the
84+
page by enclosing it in `$$` symbols.
85+
86+
```mdx
87+
$$
88+
E = mc^2
89+
$$
90+
```
91+
92+
## 6. Code snippets
93+
94+
Code snippets are where Mintlify probably differs the most from markdown. There are several ways to write code snippets, but
95+
this section describes how we do it specifically in these LanceDB docs.
96+
97+
### Option 1: `CodeGroup` components
98+
99+
The preferred way to include a code snippet is to enter it within <CodeGroup> tags, as follows:
100+
101+
```mdx
102+
<CodeGroup>
103+
```python Python icon="python"
104+
import lancedb
105+
```
106+
107+
```typescript TypeScript icon="square-js"
108+
import * as lancedb from "@lancedb/lancedb";
109+
```
110+
111+
```rust Rust icon="rust"
112+
use lancedb::connect;
113+
```
114+
</CodeGroup>
115+
116+
117+
This will allow you to include code snippets from multiple languages, grouped together on the docs page so that the user
118+
can click on their language of choice via tabs.
119+
120+
### Option 2: `CodeBlock` components within `CodeGroup`
121+
122+
As engineers, we may want to write a testable snippet in code in the `tests/py`, `tests/ts`, or `tests/rs` directory.
123+
These directories contain test files in each language that contain valid, tested code, which are fenced within comment markers
124+
so that they can be parsed by a [snippet generation script](./scripts/mdx_snippets_gen.py).
125+
126+
The snippet generation script is run to extract the relevant snippets from the file (based on the fenced comment markers
127+
indicating `start` and `end` in each test file).
128+
129+
Here's how you'd call the snippet into a code block in the MDX file:
130+
131+
```mdx
132+
import { PyConnect, TsConnect, RsConnect } from '/snippets/connection.mdx';
133+
134+
<CodeGroup >
135+
<CodeBlock filename="Python" language="Python" icon="python">
136+
{PyConnect}
137+
</CodeBlock>
138+
139+
<CodeBlock filename="TypeScript" language="TypeScript" icon="square-js">
140+
{TsConnect}
141+
</CodeBlock>
142+
143+
<CodeBlock filename="Rust" language="Rust" icon="rust">
144+
{RsConnect}
145+
</CodeBlock>
146+
</CodeGroup >
147+
```
148+
149+
### Option 3: Vanilla backticks
150+
151+
This is the least preferred approach, as it doesn't let you group together code snippets from multiple languages effectively.
152+
Note that Mintlify offers some additional features compared to traditional markdown even when using triple backticks.
153+
154+
In the example below, we may have a long code snippet that we want to collapse (to show a few lines in the rendered page).
155+
This is useful for example code or data snippets that are quite long.
156+
157+
```json camelot.json icon="brackets-curly" expandable=true
158+
[
159+
{
160+
"id": 1,
161+
"name": "King Arthur",
162+
"role": "King of Camelot",
163+
"description": "The legendary ruler of Camelot, wielder of Excalibur, and leader of the Knights of the Round Table.",
164+
"vector": [0.72, -0.28, 0.60, 0.86],
165+
"stats": { "strength": 2, "courage": 5, "magic": 1, "wisdom": 4 }
166+
}
167+
]
168+
```
169+
170+
Using vanilla backticks is okay when the code snippets like JSON blobs can get really long, and we only want to show a
171+
preview to the reader. Enabling `expandable=true` allows readers to see the whole block when they click on the "expand"
172+
button on the page.
173+
174+
## 7. Run local deployment
175+
176+
After you update the `docs.json` page with the path to the new MDX file, you can debug the site on the local deployment.
177+
178+
```bash
179+
# cd to the docs/ directory
180+
cd docs
181+
# Run local server
182+
mint dev
183+
```
184+
This will run a local deployment on `localhost:3000`, which is useful for debugging and testing purposes.
185+
186+
You can check for broken lines in the site by running the following command.
187+
188+
```bash
189+
mint broken-links
190+
```
191+
192+
> [!NOTE]
193+
> The broken link checker **only** checks for internal (relative) links to other pages within this docs repo.
194+
> It cannot check external site links.
195+
196+
## 8. Commit to the docs repo
197+
198+
Once you've finished writing and reviewing the content yourself, submit a PR to the [repo](https://github.com/lancedb/docs)
199+
for review. If you're an external contributor, we thank you for your contribution to LanceDB!

docs/api-reference/index.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ Python, Typescript and Rust SDKs are officially supported by LanceDB.
1717
| SDK Reference | Description |
1818
|:--------------|-------------------|
1919
| [Python SDK](https://lancedb.github.io/lancedb/python/python/) | Full-featured Python client with pandas & numpy integration |
20-
| [Typescript SDK](https://lancedb.github.io/lancedb/js/) | A Typescipt wrapper around the Rust library, built with `napi-rs`
20+
| [Typescript SDK](https://lancedb.github.io/lancedb/js/) | A TypeScript wrapper around the Rust library, built with `napi-rs`
2121
| [Rust SDK](https://docs.rs/lancedb/latest/lancedb/index.html) | Native Rust library with persistent-storage and high performance |
2222

2323
## Examples in other languages

docs/docs.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
{
22
"$schema": "https://mintlify.com/docs.json",
33
"appearance": {
4-
"default": "light",
54
"strict": false
65
},
76
"theme": "mint",
@@ -277,7 +276,8 @@
277276
"tutorials/agents/time-travel-rag/index",
278277
"tutorials/agents/multimodal-agent/index"
279278
]
280-
}
279+
},
280+
"tutorials/feature-engineering/index"
281281
]
282282
}
283283
]

docs/embedding/index.mdx

Lines changed: 17 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,17 @@ Modern machine learning models can be trained to convert raw data into embedding
99
of floating point numbers. The position of an embedding in vector space captures the semantics of
1010
the data, so vectors that are close to each other are considered similar.
1111

12-
LanceDB provides an embedding function registry that automatically generates vector embeddings
12+
LanceDB provides an embedding function registry in OSS as well as its Cloud and Enterprise versions
13+
([see below](#embeddings-in-lancedb-cloud-and-enterprise)).
14+
that automatically generates vector embeddings
1315
during data ingestion and querying. The API abstracts embedding generation, allowing you to focus
1416
on your application logic.
1517

16-
## Embedding function registry
18+
## Embedding Registry
1719

18-
You can get a supported embedding function from the registry, and then use it in your table schema.
20+
<Badge color="green">OSS</Badge>
21+
22+
In LanceDB OSS, you can get a supported embedding function from the registry, and then use it in your table schema.
1923
Once configured, the embedding function will automatically generate embeddings when you insert data
2024
into the table. And when you query the table, you can provide a query string or other input, and the
2125
embedding function will generate an embedding for it.
@@ -98,12 +102,14 @@ LanceDB supports most popular embedding providers.
98102

99103
You can find all supported embedding models in the [integrations](/integrations/embedding) section.
100104

101-
## Embedding function on LanceDB cloud
102-
When using embedding functions on LanceDB cloud, during the ingestion time the embeddings are
103-
generated on the client side, and stored in the cloud. We don't yet support model inference on the
104-
cloud side so automatic query generation during search is not supported. You can manually generate
105-
the embeddings for your queries using the same embedding function and pass the vector to the search
106-
function.
105+
## Embeddings in LanceDB Cloud and Enterprise
106+
Currently, the embedding registry on LanceDB <Badge color="purple">Cloud</Badge> or
107+
<Badge color="red">Enterprise</Badge> supports automatic generation of embeddings during data ingestion,
108+
generated on the client side (and stored on the remote table). We don't yet support automatic query-time
109+
embedding generation when sending queries, though this is planned for a future release.
110+
111+
For now, you can manually generate the embeddings at query time using the same embedding function that
112+
was used during ingestion, and pass the embeddings to the search function.
107113

108114
<CodeGroup>
109115
```python Python icon="python"
@@ -132,8 +138,8 @@ results = table.search(query_vector).limit(5).to_pandas()
132138

133139
## Custom Embedding Functions
134140

135-
You can implement your own embedding function by inheriting from `TextEmbeddingFunction` (for text)
136-
or `EmbeddingFunction` (for multimodal data).
141+
You can always implement your own embedding function by inheriting from `TextEmbeddingFunction`
142+
(for text) or `EmbeddingFunction` (for multimodal data).
137143

138144
<CodeGroup>
139145
```python Python icon="python"

docs/google3bfed878f4b4e309.html

Lines changed: 0 additions & 1 deletion
This file was deleted.

0 commit comments

Comments
 (0)