-
Notifications
You must be signed in to change notification settings - Fork 190
Description
Hello,
I would very much like to ingest all my local text files (pdf, docx and txt). Therefore I replaced the loader with the DirectoryLoader, as shown below. This basically works, but only the last document is ingested (I have 4 pdfs for testing).
local_path = "../data"
Local PDF file uploads
if local_path:
loader = DirectoryLoader(local_path, glob='**/[!.]*', use_multithreading=True, show_progress=True)
data = loader.load()
data[0]
Output:
100%|██████████| 4/4 [00:31<00:00, 7.93s/it]
Add to vector database
vector_db = Chroma.from_documents(
documents=chunks,
embedding=OllamaEmbeddings(model="nomic-embed-text",show_progress=True),
#embedding=OllamaEmbeddings(model="nomic-embed-text",show_progress=True),
collection_name="local-rag"
)
Output OllamaEmbedings:
OllamaEmbeddings: 100%|██████████| 143/143 [00:11<00:00, 12.73it/s]
Should be a much higher number of chunks
It would be great if my local office documents could be ingested.