A Retrieval-Augmented-Generation (RAG) chatbot designed to turn your documents into an intelligent, answerable knowledgebase. Combine vector search over your documents with a powerful language model to provide accurate, context-aware answers — ideal for docs, support knowledgebases, internal wikis, and prototypes.
Short: ingest documents → build vectors → query with a familiar chat interface backed by an LLM.
- Retrieval-Augmented Generation (RAG) architecture: answers grounded in your data, not hallucinated.
- Pluggable vector store: FAISS, Milvus, Weaviate, Pinecone, etc.
- Provider-agnostic LLM support: OpenAI, Azure OpenAI, Anthropic, or local LLMs (Llama, Mistral).
- Document ingestion pipeline: PDFs, Markdown, plain text, HTML, Office docs.
- Simple API + optional web UI for conversational flows.
- Tools for chunking, embedding, and context-management (LLM prompt templates, memory window).
- Ingestion scripts for converting documents to text and creating embeddings.
- Index builder (vector store) and retrieval layer.
- Chat service that combines retrieved context with an LLM prompt to generate answers.
- Example clients (curl / Python) and a minimal web UI (optional) to demo chat flows.
- Config-driven setup for providers and index backends.
Prereqs:
- Python 3.10+ (or your project’s specified runtime)
- pip or poetry
- A vector DB or FAISS local index
- LLM API key (e.g., OpenAI)
- Clone
git clone https://github.com/LeelaKarthik-26/RAG_Knowladgebase_ChatBot.git
cd RAG_Knowladgebase_ChatBot- Create virtual env & install
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt- Environment variables (example .env)
# LLM provider
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...
# Vector store choice: faiss | pinecone | weaviate | milvus
VECTOR_STORE=faiss
# If using Pinecone/Weaviate/Milvus supply keys/urls below
PINECONE_API_KEY=
PINECONE_ENV=
WEAVIATE_URL=
MILVUS_HOST=- Ingest your documents
# Example: ingest a folder of docs into the vector store
python scripts/ingest.py --source ./docs --index-name my_knowledgebase- Run the API server
uvicorn app.main:app --reload --port 8000
# or
python -m app.main- Chat with curl
curl -X POST "http://localhost:8000/chat" \
-H "Content-Type: application/json" \
-d '{"question":"How do I reset my password?", "index":"my_knowledgebase"}'- Ingest documents → split into chunks with overlap → compute embeddings.
- Store embeddings in a vector database.
- At query time:
- Retrieve top-K nearest passages based on embedding similarity.
- Construct a context-aware prompt (system + retrieved docs + user message).
- Send prompt to the LLM to generate final answer that cites or references the source passages.
Simple architecture:
- Ingestion -> Embeddings -> Vector DB
- API/Chat -> Retriever -> Prompt builder -> LLM -> Response
- Embeddings: OpenAI embeddings, local embedding models (e.g., sentence-transformers)
- Vector stores: FAISS, Pinecone, Weaviate, Milvus
- LLMs: OpenAI GPT, Azure OpenAI, Anthropic, local LLMs via LLM runtimes
- File types: .pdf, .md, .txt, .html, .docx
- Prompt templates live in
app/prompts/(or similar); customize to control style, length, and instruction-following. - Retrieval settings you’ll likely tweak:
- top_k (how many passages to retrieve)
- chunk_size and chunk_overlap during ingestion
- temperature, max_tokens for LLM generation
Example: Increase top_k to surface more context for complex questions.
- Keep chunks reasonably small (500–1000 tokens) with overlaps of ~100–200 tokens for context continuity.
- Use citation format or source metadata in the prompt to improve traceability.
- Limit context length given LLM token limits; use recency or relevance filters if needed.
- Periodically re-index after document updates.
import requests
resp = requests.post(
"http://localhost:8000/chat",
json={"question": "What is the refund policy?", "index": "my_knowledgebase"}
)
print(resp.json())System: "You are a helpful assistant. Use the provided documents to answer the user's question. If the answer cannot be found, say you don't know and provide steps to find the answer."
User + retrieved context: "[RETRIEVED_PASSAGES]\n\nUser: How do I configure SSO for my org?"
- Code style: follow black / isort / flake8 (if present)
- Tests: run
pytest(if tests exist) - Add new ingestors under
scripts/orapp/ingestors/ - Add new vector backends by implementing the connector interface in
app/vector_store/
Contributions welcome! Please:
- Open an issue describing the feature/bug.
- Fork and create a feature branch.
- Make the change, add tests, and update docs.
- Open a pull request referencing the issue.
See CONTRIBUTING.md for more details (if present).
This project is provided under the MIT License. See LICENSE for details.
Open an issue or contact the maintainer via GitHub: LeelaKarthik-26
Made with ❤️ for faster, fact-grounded answers from your docs.