Open
Description
First of all, thank you!
I want to start by thanking you for this amazing library. wdoc takes RAG to another level with its powerful features, great documentation and overall thoughtful implementation. The way it handles document processing, querying and summarization is really impressive.
Feature requests
I have two suggestions that could make wdoc even more versatile:
1. Support for vector databases
It would be great to have the option to store embeddings in vector databases like ChromaDB or Pinecone. This would allow:
- Better scalability for large document collections
- Persistence of embeddings across sessions
- Potential for distributed deployments
- Real-time updates to the document collection
2. Python API for easier integration
While the CLI interface is great, having a proper Python API would make it easier to integrate wdoc into other applications. For example:
from wdoc import WDoc
wdoc = WDoc()
db= #anyDbClient
#Embedding
embeddings = wdoc.create_embeddings(
documents=["doc1.pdf", "doc2.pdf"],
model="openai/text-embedding-3-small",db=db)
#Query
response = wdoc.query(
query="What is the main topic?",
documents=embeddings)
This would make it simpler to:
- Use wdoc as a library in other Python projects
- Chain operations programmatically
- Customize the workflow for specific use cases
Let me know if you'd like me to elaborate on any of these suggestions. Thanks again for this great tool!
Metadata
Metadata
Assignees
Labels
No labels