Welcome to Verba: The Golden RAGtriever, an open-source initiative designed to offer a streamlined, user-friendly interface for Retrieval-Augmented Generation (RAG) applications. In this repo, you'll find a step-by-step guide for importing PDFs into Verba by using Unstructured.io.
If you want to learn more about Verba, you can find further details on our Verba Repo.
Here is a quickstart for running this demo workflow. These two API keys are required for running this demo: OpenAI
and Unstructured
.
Please note that using this project will generate costs on your provided API key.
- Initialize a new Python Environment
python3 -m virtualenv venv
- Add Unstructured and OpenAI API key to a
.env
file
OPENAI_API_KEY=YOUR_KEY
UNSTRUCTURED_API_KEY=YOUR_KEY
- Source the Python Environment
source venv/bin/activate
source .env
- Install requirements
pip install -r requirements.txt
- (OPTIONAL) Convert PDFs into Text files
python pdf_to_txt.py
- Import data to Verba
verba import --path ./data
- Start Verba
verba start
This Repo contains PDFs about taste, smell and their combination. The data folder already contains the converted .txt files, so it's not required to run conversion script.
- Taste and Smell (https://www.researchgate.net/publication/293334369_Taste_and_Smell)
- Combining Tastes (https://www.diva-portal.org/smash/get/diva2:1605983/FULLTEXT01.pdf)
Verba exclusively utilizes OpenAI models. Be advised that the usage costs for these models will be billed to the API access key you provide. Primarily, costs are incurred during data embedding and answer generation processes. The default vectorization engine for Verba is Ada v2
.
Your contributions are always welcome! Feel free to contribute ideas, feedback, or create issues and bug reports if you find any! Visit our Weaviate Community Forum if you need any help!