Replies: 4 comments
-
|
hi @rafayaar, I suggest you to try the approach of training the index in a subset and check whether you don't get good performance as you think. The train index method will select a subsample of the data anyway if you pass too much data anyway. See |
Beta Was this translation helpful? Give feedback.
-
Hi @mlomeli1 |
Beta Was this translation helpful? Give feedback.
-
|
How do you sample the subset? It could represent the whole dataset if the sampling scheme leads to a representative sample @rafayaar. |
Beta Was this translation helpful? Give feedback.
-
|
split 0.1B x 1024 to 8 x 0.1B x 128, eight small datasets |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
A Bit Background/Goal
I have 100+ Million vectors of Dimension 1024 which I want to fit in Faiss index.
Obviously, I cant use Flat Index cause it will have immensely huge memory footprint and will take forever for search and even creating the index.
According to the documentation, I suppose I have to use IVF, PCA index in order to train the index on my Dataset.
Concern
When creating index using IVFPQ, we initially have to TRAIN the index.
MY CONCERN IS, that for training obviously I cannot fit all 100M in RAM to train the index. It has to be a small chunk to train, and then iteratively read rest of the chunks and ADD to the index.
I don't think training only over a small chunk (let's say 5M) and generalize it over rest of the 95+ million would give me good results.
Need help to find approach to train/add 100M (1024d) vectors into Faiss index?
Platform
OS:
Faiss version:
Installed from:
Faiss compilation options:
Running on:
Interface:
Beta Was this translation helpful? Give feedback.
All reactions