Skip to content

[Question] What's the corresponding add_items in this project compared to nmslib/hnswlib #13

@LizzyMiao

Description

@LizzyMiao

hi I am currently working on a project which needs millions sometimes even billions of vectors to be inserted to build up a graph, and I follow the example.py in https://github.com/nmslib/hnswlib/tree/master with 4000K vectors like below code

p = hnswlib.Index('l2', dim)
print("before build ", datetime.datetime.now())
p.init_index(max_elements = num_elements, ef_construction = 128, M = 16)
p.add_items(vectorNP, ids)
p.save_index("/Users/XXX/Projects/builder/hnsw-embedding-test/python_test/combined.bin")

it took around 2 mins to finish,

but when use with libhnswlib-jna-x86-64 with 16 cores, by

      val hnswIndex = new ConcurrentIndex(SpaceName.L2, dimension)
      hnswIndex.initialize(3890521, 16, 128, 42)
      val embeddingRecordsPar = parquet4sReader.toList.par
      embeddingRecordsPar.tasksupport = new ForkJoinTaskSupport(new ForkJoinPool(16))
      embeddingRecordsPar.foreach{ eb =>
        val ba = eb.vectors.head
        if (ba.length > 0) {
          val vector = RawEmbedding.toVector(RichByteArray(ba).asByteBuffer, dimension, "float16")
          hnswIndex.addNormalizedItem(vector, i)
          i = i + 1
        }
      }

it is around 15-16mins (same time cost if I change ConcurrentIndex into Index or use Index.synchronizedIndex), all above two part of codes runnning in my local machine, I'm wondering if there is same function like add_items in this hnswlib-jna or any other ways that can faster the speed of building up graph?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions