-
Notifications
You must be signed in to change notification settings - Fork 34
fix: store data members & make faiss tests pass #209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: rfsaliev/cpp-runtime-binding
Are you sure you want to change the base?
Conversation
33de811 to
faf3d65
Compare
| return Status{ | ||
| ErrorCode::NOT_INITIALIZED, "Cannot serialize: SVS index not initialized."}; | ||
| } | ||
| bool initialized = impl != nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain, why we have to serializae/deserialize empty inidices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the use cases in this test
https://github.com/ahuber21/faiss/blob/b709fa114afc522b3d10ffd1356df1d9a9548951/tests/test_svs.cpp#L111
which is created to validate the scenario in this issue
ahuber21/faiss#37
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ahuber21 for the problem description
After some time of investigation ana analysis, I got understanding, that inside the implementation code in SVS side we do not need to call DynamicVamana::save(std::ostream&) anymore, instead we can use filesystem-based DynamicVamana::save(std::filesystem::path, ...) method which allows to manage serialization in a more controlled and simpler way.
| "Cannot deserialize: SVS index already initialized."}; | ||
| } | ||
|
|
||
| in.read(reinterpret_cast<char*>(&leanvec_d), sizeof(size_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I would reuse existing matrix serialization mechanizm implemented in SVS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, now that it has been moved to SVS we can just use that. Is it difficult to re-use what you added for index save/load?
| matrix.set_datum(i, datum); | ||
| } | ||
|
|
||
| leanvec_matrix = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would try to extract leanvec matrices from loaded svs::leanvec::LeanDataset instead of implementing custom serialization here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current implementation it's possible to have a leanvec matrix (after running train) without having an index.
|
|
||
| Status IndexSVSVamanaLVQImpl::serialize_impl(std::ostream& out) const noexcept { | ||
| // Also store LVQ specific members | ||
| out.write(reinterpret_cast<const char*>(&lvq_level), sizeof(LVQLevel)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far, to make everything consistent, shouldn't we save IndexSVSVamanaImpl members as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there only ntotal_soft_deleted which is always 0 because we compact on save?
Yes, I guess it makes sense to add the public members too.
According to LeavVec training, saving and loading.
From the SVS runtime side we should implement such functionality:
From FAISS side,
As I understand, the state 'Non-empty, untrained` - is not acceptable.
|
Implementations should take care of storing their own data member values when saving. The
is_trainedlogic for LeanVec had to be adapted to allow for use cases reported in ahuber21/faiss#37