Replies: 2 comments 4 replies
-
|
You could also write a function that compares all chunks of a document, and give you back 5 chunks that are the least similar. So that way, the master index for that document would contain the most amount of information about the document with the least amount of space used You could also theoretically remove certain function words like "the, or, and, is, a ....", that way, the master index would be even smaller while still containing the important information |
Beta Was this translation helpful? Give feedback.
-
|
As an FYI... Vectra now has a full fledged document index and a CLI for ingesting documents. The new |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Just sharing some ideas for how to use Vectra to build a full fledged document index.
Yes you could just break each document into chunks and then add all the individual chunks to a Vectra index that aggregates all the documents but my specific goal is to first be able to find the most relevant documents semantically and then find the relevant parts of each individual document to embed in a prompt.
You need multiple indexes for this but since they’re all local they’re fast and free :)
The core idea is that every document is first added to its own local Vectra index. Then the document gets added to a master index which aggregates all documents. The goal of the master index is to identify the documents that might best contain the answer. There are a couple of ways this could work.
You could just add all of the individual chunks to the master index but that’s going to take up a lot of space and I’m questioning whether it actually buys you anything…
The alternative is to take the first 5 chunks of every document and add that to the master index. That limits the number of chunks in memory for any document and I’d go a step further to not even save any metadata for this master index at all since it will never be used .
Beta Was this translation helpful? Give feedback.
All reactions