Skip to content

Ingest Folder #15

@pdchristian

Description

@pdchristian

Hello,

I would very much like to ingest all my local text files (pdf, docx and txt). Therefore I replaced the loader with the DirectoryLoader, as shown below. This basically works, but only the last document is ingested (I have 4 pdfs for testing).

local_path = "../data"

Local PDF file uploads

if local_path:
loader = DirectoryLoader(local_path, glob='**/[!.]*', use_multithreading=True, show_progress=True)
data = loader.load()
data[0]

Output:
100%|██████████| 4/4 [00:31<00:00, 7.93s/it]

Add to vector database

vector_db = Chroma.from_documents(
documents=chunks,
embedding=OllamaEmbeddings(model="nomic-embed-text",show_progress=True),
#embedding=OllamaEmbeddings(model="nomic-embed-text",show_progress=True),
collection_name="local-rag"
)

Output OllamaEmbedings:
OllamaEmbeddings: 100%|██████████| 143/143 [00:11<00:00, 12.73it/s]
Should be a much higher number of chunks

It would be great if my local office documents could be ingested.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions