This repository contains the Python implementation of a Retrieval Augmented Generation (RAG) pipeline designed for low-budget scenarios, as detailed in the accompanying blog post. It leverages llama_index with efficient tools like FastEmbedEmbedding for embeddings, SentenceTransformerRerank for re-ranking, and Ollama for local language model interaction.
This project is an implementation of the concepts discussed in the Medium blog post: Low-Budget RAG pipeline for your small company.
This project implements a Retrieval Augmented Generation (RAG) system using llama_index. It enables users to embed documents from a specified directory, build a vector store index, and then perform queries against this index. The system leverages FastEmbedEmbedding for efficient embeddings, SentenceTransformerRerank for re-ranking retrieved nodes, and Ollama for interacting with a local language model.
- Install Dependencies: This project uses Python and relies on
llama_indexand other related libraries. Arequirements.txtfile is not provided, but the key libraries observed arellama_index(with integrations forFastEmbedEmbeddingandOllama) andnumpy.- TODO: Create a
requirements.txtfile or list the exact pip install commands for all dependencies.
- TODO: Create a
- Ollama: Ensure Ollama is installed and running, as the
rag.pyandretriever.pyscripts are configured to use an Ollama instance at127.0.0.1:11434. You may also need to pull theqwen3:0.6bmodel used inrag.pyandretriever.py.
The embed.py script processes documents from the ./test_data/ directory and creates a persistent vector store index in the ./indexed-data/ directory.
python embed.py-
Full RAG Pipeline (Query LLM): The
rag.pyscript executes a full RAG pipeline, retrieving relevant information and then querying the configured LLM (Ollama).python rag.py
-
Document Retrieval: The
query.pyscript demonstrates how to retrieve relevant documents based on a query. The results are saved totmp_result.json.python query.py
-
Retrieval with Reranking: The
retriever.pyscript demonstrates document retrieval followed by re-ranking using aSentenceTransformerRerankmodel. The re-ranked results are saved totmp_reranked.json.python retriever.py
- Framework: The project heavily utilizes the
llama_indexframework for RAG functionalities. - Data Storage: Indexed data (vector store) is persisted in the
./indexed-data/directory. - Test Data: Raw documents for indexing are expected to be placed in the
./test_data/directory. - JSON Output: Helper functions in
utils.pyare used for serializing and dumping Python objects to JSON files, primarily for inspecting retrieval and re-ranking results.