This chatbot is powered by the LLaMA 3 model and runs entirely on CPU using ctransformers
.
It is designed to be lightweight, efficient, and responsive while providing an interactive Gradio UI for seamless conversations.
✅ Runs on CPU – No GPU required, making it accessible on standard hardware
✅ Optimized with ctransformers
– Faster inference on CPUs
✅ Concise & direct responses – Avoids unnecessary small talk
✅ Interactive Gradio UI – Easy-to-use web interface
✅ Maintains chat history – Context-aware responses
To run this chatbot locally, follow these steps:
Ensure you have Python 3.8+ installed, then run:
pip install gradio ctransformers
You need the LLaMA 3 GGUF model. Download it from TheBloke's Hugging Face repository. Move the .gguf model file to your project directory.
python app.py
Model Used: LLaMA 3 (8B) - Quantized (Q4_K_M) Why CPU?: This chatbot is optimized to run without a GPU, making it accessible to more users. Optimization: Adjusted temperature, response length, and stop tokens for more accurate answers.