-
Notifications
You must be signed in to change notification settings - Fork 169
Description
Description
Recently, for native libraries, we introduced a change to interact with the files via indexinput and indexoutput. With this, we should be able to remove our custom compoundformat in our codec (see #2185).
However, when removing it, we get an error like:
Caused by: org.apache.lucene.index.CorruptIndexException: compound sub-files must have a valid codec header and footer: codec header mismatch: actual header=1232620912 vs expected header=1071082519 (resource=BufferedChecksumIndexInput(MemorySegmentIndexInput(path="/Users/jmazane/workspace/Opensearch/DockerRunner/k-NN-1/build/testclusters/integTest-0/data/nodes/0/indices/6zG12XzjQLaWyWiAL_OFaQ/0/index/_0_165_test_nested.test_vector.faiss")))
at org.apache.lucene.codecs.CodecUtil.verifyAndCopyIndexHeader(CodecUtil.java:287) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
This is because for the native index files we write a footer but no header: https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/codec/nativeindex/NativeIndexWriter.java#L141-L150. See CompoundFormat interface: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/CompoundFormat.java#L41-L45.
We should get rid of the CompoundFormat so we can move towards just extending the PerFieldVectorFormat. To do this, we need to write the header, and make sure to read it before forwarding on the output to the underlying libraries.
From a bwc perspective, for old codecs, we will need to keep around the old CompoundFormat. But we should be able to remove on new codecs.