Skip to content

Conversation

@ahuber21
Copy link
Contributor

Implementations should take care of storing their own data member values when saving. The is_trained logic for LeanVec had to be adapted to allow for use cases reported in ahuber21/faiss#37

@rfsaliev rfsaliev force-pushed the ahuber/cpp-runtime-binding branch from 33de811 to faf3d65 Compare October 31, 2025 17:08
return Status{
ErrorCode::NOT_INITIALIZED, "Cannot serialize: SVS index not initialized."};
}
bool initialized = impl != nullptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain, why we have to serializae/deserialize empty inidices?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the use cases in this test
https://github.com/ahuber21/faiss/blob/b709fa114afc522b3d10ffd1356df1d9a9548951/tests/test_svs.cpp#L111

which is created to validate the scenario in this issue
ahuber21/faiss#37

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ahuber21 for the problem description
After some time of investigation ana analysis, I got understanding, that inside the implementation code in SVS side we do not need to call DynamicVamana::save(std::ostream&) anymore, instead we can use filesystem-based DynamicVamana::save(std::filesystem::path, ...) method which allows to manage serialization in a more controlled and simpler way.

"Cannot deserialize: SVS index already initialized."};
}

in.read(reinterpret_cast<char*>(&leanvec_d), sizeof(size_t));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I would reuse existing matrix serialization mechanizm implemented in SVS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, now that it has been moved to SVS we can just use that. Is it difficult to re-use what you added for index save/load?

matrix.set_datum(i, datum);
}

leanvec_matrix =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would try to extract leanvec matrices from loaded svs::leanvec::LeanDataset instead of implementing custom serialization here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current implementation it's possible to have a leanvec matrix (after running train) without having an index.


Status IndexSVSVamanaLVQImpl::serialize_impl(std::ostream& out) const noexcept {
// Also store LVQ specific members
out.write(reinterpret_cast<const char*>(&lvq_level), sizeof(LVQLevel));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, to make everything consistent, shouldn't we save IndexSVSVamanaImpl members as well?

Copy link
Contributor Author

@ahuber21 ahuber21 Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there only ntotal_soft_deleted which is always 0 because we compact on save?

Yes, I guess it makes sense to add the public members too.

@rfsaliev
Copy link
Member

rfsaliev commented Nov 3, 2025

According to LeavVec training, saving and loading.

  • We have the LeanVecMatrices entity - let's call it TrainingInfo abstract
  • We have LeanVecIndex which can be initialized (built) with or without TrainingInfo
  • LeanVecIndex should be able to be saved/loaded
  • We should be able to save/load TrainingInfo and build a LeanVecIndex with pre-loaded TrainingInfo

From the SVS runtime side we should implement such functionality:

  1. Build a TrainingInfo instance
  2. Serialize TrainingInfo instance
  3. Deserialize TrainingInfo
  4. Build LeanVecIndex with TrainingInfo
  5. Serialize LeanvecIndex
  6. Deserialize LeanVecIndex

From FAISS side, LeanVecIndex may have one of following states:

  • Empty, untrained
  • Empty, trained
  • Non-empty, trained

As I understand, the state 'Non-empty, untrained` - is not acceptable.
To handle these states, faiss-side logic can look like:

  • IndexSVSVamanaLeanVec class fields:

    IndexPtr=NULL;
    TrainigInfoPtr=NULL;
    enum state{empty_untrained, empty_trained, non_empty};
    state get_state() { return IndexPtr ? non_empty : (TrainingInfoPtr ? empty_trained : empty_untrained); }
  • IndexSVSVamanaLeanVec::train(...):

    error_if(state == non_empty)
    TrainingInfoPtr = svs::runtime::build_leanvec_training(...);
    this->trained = true;
  • IndexSVSVamanaLeanVec::add(...):

    error_if(!trained);
    if (IndexPtr == NULL)
      IndexPtr = svs::build_leanvec(...TrainingInfoPtr);
    else
      IndexPtr->add(...);
    trained = true;
  • IndexSVSVamanaLeanVec::serialize(...):

    error_if(!trained);
    save_state(get_state());
    if (state == empty_trained)
      TrainingInforPtr->serialize()
    else
      IndexPtr->serialize()`;
  • IndexSVSVamanaLeanVec::deserialize(...):

    state = load_state();
    error_if(state == empty_untrained);
    if (state == empty_trained)
      TrainingInforPtr = svs::deserialize_training_info(...);
    else
      IndexPtr = svs::deserialize_index(...)
    trained=true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants