Skip to content

Latest commit

 

History

History
48 lines (30 loc) · 2.16 KB

README.md

File metadata and controls

48 lines (30 loc) · 2.16 KB

🐕 Verba X Unstructured (demo showcase)

Welcome to Verba: The Golden RAGtriever, an open-source initiative designed to offer a streamlined, user-friendly interface for Retrieval-Augmented Generation (RAG) applications. In this repo, you'll find a step-by-step guide for importing PDFs into Verba by using Unstructured.io.

If you want to learn more about Verba, you can find further details on our Verba Repo.

✨ Quickstart

Here is a quickstart for running this demo workflow. These two API keys are required for running this demo: OpenAI and Unstructured. Please note that using this project will generate costs on your provided API key.

  1. Initialize a new Python Environment
  • python3 -m virtualenv venv
  1. Add Unstructured and OpenAI API key to a .env file
  • OPENAI_API_KEY=YOUR_KEY
  • UNSTRUCTURED_API_KEY=YOUR_KEY
  1. Source the Python Environment
  • source venv/bin/activate
  • source .env
  1. Install requirements
  • pip install -r requirements.txt
  1. (OPTIONAL) Convert PDFs into Text files
  • python pdf_to_txt.py
  1. Import data to Verba
  • verba import --path ./data
  1. Start Verba
  • verba start

📦 Dataset

This Repo contains PDFs about taste, smell and their combination. The data folder already contains the converted .txt files, so it's not required to run conversion script.

💰 Large Language Model (LLM) Costs

Verba exclusively utilizes OpenAI models. Be advised that the usage costs for these models will be billed to the API access key you provide. Primarily, costs are incurred during data embedding and answer generation processes. The default vectorization engine for Verba is Ada v2.

💖 Open Source Contribution

Your contributions are always welcome! Feel free to contribute ideas, feedback, or create issues and bug reports if you find any! Visit our Weaviate Community Forum if you need any help!