Skip to content

docs: add lancedb integration page #4304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion daft/dataframe/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -1195,7 +1195,7 @@ def write_lance(
**kwargs: Additional keyword arguments to pass to the Lance writer.

Note:
write_lance` requires python 3.9 or higher
`write_lance` requires python 3.9 or higher

Examples:
>>> import daft
Expand All @@ -1210,6 +1210,7 @@ def write_lance(
╰───────────────┴──────────────────┴─────────────────┴─────────╯
<BLANKLINE>
(Showing first 1 of 1 rows)
<BLANKLINE>
>>> daft.read_lance("/tmp/lance/my_table.lance").collect() # doctest: +SKIP
╭───────╮
│ a │
Expand All @@ -1226,6 +1227,7 @@ def write_lance(
╰───────╯
<BLANKLINE>
(Showing first 4 of 4 rows)
<BLANKLINE>
>>> # Pass additional keyword arguments to the Lance writer
>>> # All additional keyword arguments are passed to `lance.write_fragments`
>>> df.write_lance("/tmp/lance/my_table.lance", mode="overwrite", max_bytes_per_file=1024) # doctest: +SKIP
Expand Down
65 changes: 65 additions & 0 deletions docs/integrations/lance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# LanceDB

[LanceDB](https://github.com/lancedb/lancedb) is an open-source database for vector-search built with persistent storage, which greatly simplifies retrieval, filtering and management of embeddings.

Daft currently supports reading from and writing to a LanceDB table. To use Daft with LanceDB, you will need to install Daft with the 'lance' option specific like so:

```bash
pip install daft[lance]
```

## Create a DataFrame from a LanceDB table

You can create a Daft DataFrame by reading from a local LanceDB table with [`read_lance`][daft.read_lance]:

```python
df = daft.read_lance("/tmp/lance/my_table.lance")
df.show()
```
``` {title=Output}
╭───────╮
│ a │
│ --- │
│ Int64 │
╞═══════╡
│ 1 │
├╌╌╌╌╌╌╌┤
│ 2 │
├╌╌╌╌╌╌╌┤
│ 3 │
├╌╌╌╌╌╌╌┤
│ 4 │
╰───────╯

(Showing first 4 of 4 rows)
```

Likewise, you can also create a Daft DataFrame from reading a LanceDB table from a public S3 bucket:

```python
from daft.io import S3Config
s3_config = S3Config(region="us-west-2", anonymous=True)
df = daft.read_lance("s3://daft-public-data/lance/words-test-dataset", io_config=s3_config)
df.show()
```

## Write to a LanceDB table

You can write a Daft DataFrame to a LanceDB table with [`write_lance`][daft.DataFrame.write_lance]

```python
import daft
df = daft.from_pydict({"a": [1, 2, 3, 4]})
df.write_lance("/tmp/lance/my_table.lance")
```
``` {title=Output}
╭───────────────┬──────────────────┬─────────────────┬─────────╮
│ num_fragments ┆ num_deleted_rows ┆ num_small_files ┆ version │
│ --- ┆ --- ┆ --- ┆ --- │
│ Int64 ┆ Int64 ┆ Int64 ┆ Int64 │
╞═══════════════╪══════════════════╪═════════════════╪═════════╡
│ 1 ┆ 0 ┆ 1 ┆ 1 │
╰───────────────┴──────────────────┴─────────────────┴─────────╯

(Showing first 1 of 1 rows)
```
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ nav:
- Apache Iceberg: integrations/iceberg.md
- AWS Glue: integrations/glue.md
- Delta Lake: integrations/delta_lake.md
- LanceDB: integrations/lance.md
- S3 Tables: integrations/s3tables.md
- Unity Catalog: integrations/unity_catalog.md
- Storage:
Expand Down
Loading