Replies: 4 comments
-
Howdy! |
Beta Was this translation helpful? Give feedback.
-
Hi!
|
Beta Was this translation helpful? Give feedback.
-
if one wants to go even further, one could add even more advanced / efficient indexes like filtered DiskANN |
Beta Was this translation helpful? Give feedback.
-
just to give it some visibility (was already mentioned in first post):
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Embeddings are here to stay.
It may be a good idea to support them one future day in Ducklake too.
fields of usage:
If you offload your Data from DB to an lake for long time storage and usage, more and more Datasets contain embeddings and often have stored theses directly in their database (almost all Databases today have support for this or are just getting this build in..).
With this the question arise, why a lake is not the right place to have these stored too?
As discussed, Ducklake is somehow different to other Lakes (speed, efficiency etc) and with this new possible usecases arise.
see e.g. Adding indexes #389 (reply in thread)
Many Data uscases today include AI and with this many rely on embeddings...
To booster these new usecases, the support for embedding may be the extraordinary fuel...
-> What do you think?
Of course we are not the first to think of this "vector lake", but maybe Ducklake is the perfect basement to make this happen
https://blog.lancedb.com/why-dataframe-libraries-need-to-understand-vector-embeddings-291343efd5c8/
https://www.linkedin.com/pulse/vector-data-lakes-powering-future-ai-search-jathin-gangi-qqvdc/
This may give some further ideas/background on implementation:
https://minimaxir.com/2025/02/embeddings-parquet/
Beta Was this translation helpful? Give feedback.
All reactions