Skip to content

Feature Request: Vector DB Support & Python API Enhancement #13

Open
@NDA-Github

Description

@NDA-Github

First of all, thank you!

I want to start by thanking you for this amazing library. wdoc takes RAG to another level with its powerful features, great documentation and overall thoughtful implementation. The way it handles document processing, querying and summarization is really impressive.

Feature requests

I have two suggestions that could make wdoc even more versatile:

1. Support for vector databases

It would be great to have the option to store embeddings in vector databases like ChromaDB or Pinecone. This would allow:

  • Better scalability for large document collections
  • Persistence of embeddings across sessions
  • Potential for distributed deployments
  • Real-time updates to the document collection

2. Python API for easier integration

While the CLI interface is great, having a proper Python API would make it easier to integrate wdoc into other applications. For example:

from wdoc import WDoc

wdoc = WDoc()
db= #anyDbClient

#Embedding
embeddings = wdoc.create_embeddings(
documents=["doc1.pdf", "doc2.pdf"],
model="openai/text-embedding-3-small",db=db)

#Query
response = wdoc.query(
query="What is the main topic?",
documents=embeddings)

This would make it simpler to:

  • Use wdoc as a library in other Python projects
  • Chain operations programmatically
  • Customize the workflow for specific use cases

Let me know if you'd like me to elaborate on any of these suggestions. Thanks again for this great tool!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions