A simple Retrieval-Augmented Generation (RAG) chatbot using LangChain, HuggingFace embeddings, and the Groq LLM. It indexes local .txt files and answers questions based only on the content of those files.
- Loads
.txtdocuments from a folder - Splits content into chunks
- Generates vector embeddings using HuggingFace
- Stores embeddings in memory
- Uses Groq LLM for answering questions
- Returns document-based answers (with optional fallback to LLM if needed)
rag_chat_bot/
β
βββ rag_demo.py # Main script to run the RAG chatbot
βββ documents/ # Folder containing your .txt files
β βββ python_basics.txt
β βββ machine_learning.txt
β βββ rag_technology.txt
βββ .venv/ # Optional: Your virtual environment
βββ README.md # This file
git clone https://github.com/darunnatarajan/rag_chat_bot.git
cd rag_chat_botpython -m venv .venv
# Windows
.venv\Scripts\activate
# macOS/Linux
source .venv/bin/activatepip install -r requirements.txtIf you donβt have a requirements.txt, you can use:
pip install langchain langchain-huggingface huggingface-hub groqCreate a .env file or set environment variables manually with your API key for Groq.
export GROQ_API_KEY="your-groq-api-key"Or on Windows (PowerShell):
$env:GROQ_API_KEY = "your-groq-api-key"python rag_demo.pyYou will see:
==================================================
RAG System Ready! Ask your questions:
Available topics: Python, Machine Learning, RAG
==================================================
Example prompts:
- What is Python?
- Who created Python?
- What does RAG stand for?
Type 'quit' to exit the chat.
-
Document Loading: Loads
.txtfiles from thedocuments/folder. -
Chunking: Breaks each file into manageable pieces.
-
Embedding: Converts text chunks into vectors using HuggingFace embeddings.
-
Vector Store: Stores those vectors in memory for fast retrieval.
-
Querying: When you ask a question, it:
- Retrieves the most relevant chunks
- Passes them to the Groq LLM to generate an answer
Place your .txt files into the documents/ folder. The bot will automatically load and index them on startup.
Question: What is Python?
Answer: Python is a high-level programming language known for its simplicity and readability.
- The LLM (Groq) may fall back to its own internal knowledge only if nothing is retrieved β this can be disabled if you want document-only answers.
- You can customize chunk size, embedding model, or LLM settings in the script.
MIT License.