This is a RAG chatbot built using LangGraph, LangChain, and either the IBM Granite model, LLaMA 3.1 via Ollama, or LLama 2. The chatbot answers technical questions based on the KRKN pod scenarios documentation.
Note: To ensure accurate responses based on the provided documentation, please include the keyword “krkn” or other krkn context in your questions. This helps the system retrieve relevant context from the Krkn knowledge base, rather than generating general answers from unrelated sources.
git clone https://github.com/krkn-chaos/krkn-assist.git
cd krkn-assistpython3 -m venv venv
source venv/bin/activatepip install -r requirements.txtIf using the llama 3.1 LLM (reccomended), run this script:
brew install ollama
ollama run llama3If using llama 2:7b LLM, run this script:
brew install ollama
ollama pull llama2:7bDownload instructions here
Ensure that ollama is running in the background
- open main.py and uncomment the code for the LLM you would like to use
- run
python3 main.py(depending on your python version)
- run
streamlit run app.py
-
Document Processing: The system loads and processes documentation files from the
docs/directory under github.com/krkn-chaos/website, splitting them into manageable chunks for efficient retrieval.- Documents can be loaded as:
- PDF (stored in a specific folder)
- Markdown files
- Urls
- Documents can be loaded as:
-
Vector Database Creation: Document chunks are converted into embeddings using HuggingFace's sentence transformers and stored in a Chroma vector database for semantic search.
-
RAG Pipeline Setup: A Retrieval-Augmented Generation (RAG) pipeline is established using LangGraph and LangChain, combining document retrieval with language model generation.
-
Model Integration: The chatbot integrates with your chosen LLM (IBM Granite, LLaMA 3.1 via Ollama, or LLaMA 2) to generate contextually relevant responses.
-
Query Processing: When you ask a question, the system:
- Retrieves relevant document chunks from the vector database
- Provides context to the language model
- Generates an answer based on the retrieved KRKN documentation
- Returns the response with source citations when available
-
Interactive Chat: The terminal interface allows for continuous conversation, maintaining context throughout the session.
Enhancements being planned can be found in the roadmap
LLM performance improves significantly with better laptop hardware. LLM was tested on two different laptops:
- Laptop 1: Apple M3 Pro, 36 GB RAM, 12-core CPU, 18-core GPU
- Laptop 2: Apple M1, 16 GB RAM, 8-core CPU, 12-core GPU
Answers were generated in under 10 seconds on laptop 1, whereas answers were generated in 15-30 seconds on laptop 2. (for llama 3.1 LLM)
If you want to evaluate the performance of the LLM being used to generate answers: User guide to the evaluation pipeline
Note: The output of steps 1-3 are the files in the folder evaluationPipeline
- open eval.py and uncomment the code for the model you are evaluating
- edit the email field on line 121 with the email that evaluation metrics should be sent to
- after the script runs, open the json file (file name is on line 125)
- copy the entire json file and open the Evaluation Pipeline Endpoint (must connected to VPN).
- make sure the json structure matches the required format in the endpoint and paste it in these three endpoints
/evaluate_context_retrieval,evaluate_response, andevaluate_all - evaluation metrics should be emailed to you